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NOVEL CBH1 HOMOLOGS AND VARIANT CBH1 CELLULASES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

[01] This application claims priority to U.S. Provisional Application No. 60/456,368 filed 

March 21, 2003 (Attorney Docket No. GC793P) and to U.S. Provisional Application No. 
60/458,696 filed March 27, 2003 (Attorney Docket No. GC793-2P), all herein incorporated by 
reference. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

[02] Portions of this work were funded by Subcontract No. ZCO-0-3001 7-01 with the 

National Renewable Energy Laboratory under Prime Contract No. DE-AC36-99GO10337 
with the U.S. Department of Energy. Accordingly, the United States Government may have 
certain rights in this invention. 

FIELD OF THE INVENTION 

[03] This invention relates to homologs and variants of Hypocrea jecorina (Trichoderma 
reesei) CBH1. The present invention relates to isolated nucleic acid sequences which 
encode polypeptides having cellobiohydrolase activity. The invention also relates to nucleic 
acid constructs, vectors, and host cells comprising the nucleic acid sequences as well as 
methods for producing recombinant variant CBH polypeptides and novel homologs of H. 
jecorina CBH1. 
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BACKGROUND OF THE INVENTION 

[04] Cellulose and hemicellulose are the most abundant plant materials produced by 
photosynthesis. They can be degraded and used as an energy source by numerous 
microorganisms, including bacteria, yeast and fungi, that produce extracellular enzymes 
capable of hydrolysis of the polymeric substrates to monomeric sugars (Aro et a/., 2001 ). As 
the limits of non-renewable resources approach, the potential of cellulose to become a major 
renewable energy resource is enormous (Krishna et a/., 2001). The effective utilization of 
cellulose through biological processes is one approach to overcoming the shortage of foods, 
feeds, and fuels (Ohmiya et a/., 1997). 

[05J Cellulases are enzymes that hydrolyze cellulose (beta-1 ,4-glucan or beta D- 
glucosidic linkages) resulting in the formation of glucose, cellobiose, cellooligosaccharides, 
and the like. Cellulases have been traditionally divided into three major classes: 
endoglucanases (EC 3.2.1.4) ("EG"), exoglucanases or cellobiohydrolases (EC 3.2.1.91) 
("CBH") and beta-glucosidases ([beta] -D-glucoside glucohydrolase; EC 3.2.1.21) ("BG"). 
(Knowles et a/., 1987; Shulein, 1988). Endoglucanases act mainly on the amorphous parts 
of the cellulose fibre, whereas cellobiohydrolases are also able to degrade crystalline 
cellulose (Nevalainen and Penttila, 1995). Thus, the presence of a cellobiohydrolase in a 
cellulase system is required for efficient solubilization of crystalline cellulose (Suurnakki, et 
al. 2000). Beta-glucosidase acts to liberate D-glucose units from cellobiose, cello- 
oligosaccharides, and other glucosides (Freer, 1993). 

[06] Cellulases are known to be produced by a large number of bacteria, yeast and fungi. 
Certain fungi produce a complete cellulase system capable of degrading crystalline forms of 
cellulose, such that the cellulases are readily produced in large quantities via fermentation. 
Filamentous fungi play a special role since many yeast, such as Saccharomyces cerevisiae, 
lack the ability to hydrolyze cellulose. See, e.g., Aro et a/., 2001; Aubert et al., 1988; Wood 
et a/., 1988, and Coughlan, et al.. 

[07] The fungal cellulase classifications of CBH, EG and BG can be further expanded to 
include multiple components within each classification. For example, multiple CBHs, EGs 
and BGs have been isolated from a variety of fungal sources including Trichoderma reesei 
which contains known genes for 2 CBHs, i.e., CBH1 and CBH II, at least 8 EGs, i.e., EG I, 
EG II , EG III, EGIV, EGV, EGVI, EGVII and EGVIII, and at least 5 BGs, i.e., BG1, BG2, 
BG3, BG4 and BGS. 

[08] In order to efficiently convert crystalline cellulose to glucose the complete cellulase 
system comprising components from each of the CBH, EG and BG classifications is 
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required, with isolated components less effective in hydrolyzing crystalline cellulose (Filho et 
al. t 1996). A synergistic relationship has been observed between cellulase components 
from different classifications. In particular, the EG-type cellulases and CBH- type cellulases 
synergistically interact to more efficiently degrade cellulose. See, e.g., Wood, 1985. 
[09] Cellulases are known in the art to be useful in the treatment of textiles for the 
purposes of enhancing the cleaning ability of detergent compositions, for use as a softening 
agent, for improving the feel and appearance of cotton fabrics, and the like (Kumar et a/., 
1997). 

[10] Cellulase-containing detergent compositions with improved cleaning performance 
(US Pat. No. 4,435,307; GB App. Nos. 2,095,275 and 2,094,826) and for use in the 
treatment of fabric to improve the feel and appearance of the textile (US Pat. Nos. 
5,648,263, 5,691,178, and 5,776,757; GB App. No. 1,358,599; The Shizuoka Prefectural 
Hammamatsu Textile Industrial Research Institute Report, Vol. 24, pp. 54-61, 1986), have 
been described. 

[11] Hence, cellulases produced in fungi and bacteria have received significant attention. 
In particular, fermentation of Trichoderma spp. (e.g., Trichoderma longibrachiatum or 
Trichoderma reesei) has been shown to produce a complete cellulase system capable of 
degrading crystalline forms of cellulose. 

[12] Although cellulase compositions have been previously described, there remains a 
need for new and improved cellulase compositions for use in household detergents, 
stonewashing compositions or laundry detergents, etc. Cellulases that exhibit improved 
performance are of particular interest. 

BRIEF SUMMARY OF THE INVENTION 

[13] The invention provides an isolated cellulase protein, identified herein as a desired 
cellulase, and nucleic acids which encode the desired cellulase. The desired cellulase may 
be selected from the group consisting of a variant CBH1 from Hypocrea jecorina and a novel 
CBH1 from Hypocrea schweinitzii, Hypocrea orientalis, Trichoderma pseudokoningii or 
Trichoderma konilangbra. 

[14] A variant CBH1 cellulase is provided, wherein the variant comprises a substitution or 
deletion at a position corresponding to one or more of residues L6, S8, P13, Q17, G22, T24, 
Q27, T41, S47, N49, T59, T66, A68, C71, A77, G88, N89, A100, N103, A112, S113, L125, 
T160, Y171, Q186, E193, S195, C210, M213, L225, T226, P227, T232, E236, E239, G242, 
T246, D249, N250, R251, Y252, D257, D259, S278, T281, L288, E295, T296, S297, A299, 
N301, F311, L318, E325, N327, D329, T332, A336, S341, S342, F352, K354, T356, G359, 
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D368, Y371, N373, T380, Y381, N384, V393, R394, V407, P412, T417, F418 G430, N436, 
G440, P443, T445, Y466, T478, A481 and/or N490 in CBH1 from Hypocrea jecorina. 
[15] In a second aspect, the variant CBH1 comprises a substitution at a position 
corresponding to one or more of residues Q1 86(E), S195(A/F), E239S, 
G242(H/Y/N/S/T/D/A), D249(K/UY/C/IA//W/T/N/M), E325(S/T), T332(A/H/Y/L/K), and 
P412(T/S/A). 

[16] In a second embodiment the invention provides a Hypocrea orientalis CBH1 . 
[17] in a third embodiment the invention provides a Hypocrea schweinitzii CBH1. 
[18] In a fourth embodiment, there is provided a Trichoderma konilangbra CBH1. 
[19] In a fifth embodiment, there is provided a Trichoderma pseudokoningii CBH1 . 
[20] In another embodiment of the invention, a nucleic acid that encodes an inventive 
desired cellulase is provided. In another embodiment, the DNA is in a vector. In a further 
embodiment, the vector is used to transform a host cell. 

[21] In another embodiment of this invention, a method for producing an inventive desired 
cellulase is provided. The method comprises the steps of culturing a host cell transformed 
with a nucleic acid encoding a desired cellulase in a suitable culture medium under suitable 
conditions to produce the desired cellulase and obtaining the desired cellulase so produced. 
[22] In yet another embodiment of the invention, a detergent comprising a surfactant and 
a desired cellulase is provided. In one aspect of this invention, the detergent is a laundry or 
a dish detergent. In second aspect of this invention, the desired CBH1 cellulase is used in 
the treatment of a cellulose containing textile, in particular, in the stonewashing or indigo 
dyed denim. Alternatively, the cellulase of this invention can be used as a feed additive, in 
the treatment of wood pulp, and in the reduction of biomass to glucose. 

BRIEF DESCRIPTION OF THE DRAWINGS 

[23] Figure 1 shows the nucleic acid (lower line) (SEQ ID NO:1 ) and amino acid (upper 
line) (SEQ ID NO:2) sequence of the wild type Cel7A (CBH1) from H. jecorina. 
[24] Figures 2A and 2B show the amino acid alignment of the Cel7A family members H. 
jecorina (also referred to as T. reesei) (SEQ ID NO:2), H. orientalis (SEQ ID NO:5), H. 
schweinitzii (SEQ ID NO:8), T. konilangbra (SEQ ID NO:1 1) and T. pseudokoningii (SEQ ID 
NO: 14). The consensus sequence is also shown. 

[25] Figure 3 is the genomic DNA sequence for H. orientalis CBH1 (SEQ ID NO:3). 
Introns are in bold and underlined. 

[26] Figure 4 is the signal sequence (A) (SEQ ID NO:4) and mature amino acid sequence 
(B) (SEQ ID NO:5) for H. orientalis CBH1. 
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[27] Figure 5 is the genomic DNA sequence for H. schweinitzii CBH1 (SEQ ID NO:6). 
Introns are in bold and underlined. 

[28] Figure 6 is the signal sequence (A) (SEQ ID NO:7) and mature amino acid sequence 
(B) (SEQ ID NO:8) for H. schweinitzii CBH1 . 

[29] Figure 7 is the genomic DNA sequence for 7. konilangbra CBH1 (SEQ ID NO:9). 
Introns are in bold and underlined. 

[30] Figure 8 is the signal sequence (A) (SEQ ID NO: 10) and mature amino acid 
sequence (B) (SEQ ID NO:11)for T. konilangbra CBH1. 

[31] Figure 9 is the genomic DNA sequence for T. pseudokoningii CBH1 (SEQ ID NO:12). 
Introns are in bold and underlined. 

[32] Figure 10 is the signal sequence (A) (SEQ ID NO:13) and mature amino acid 
sequence (B) (SEQ ID NO:14) for T. pseudokoningii CBH1 . 

[33] Figure 11 is the pRAX1 vector. This vector is based on the plasmid pGAPT2 except 
a 5259bp Hindlll fragment of Aspergillus nidulans genomic DNA fragment AMA1 sequence 
(Molecular Microbiology 1996 19:565-574) was inserted. Base 1 to 1 134 contains 
Aspergillus niger glucoamylase gene promoter. Base 3098 to 3356 and 4950 to 4971 
contains Aspergillus niger glucoamylase terminator. Aspergillus nidulans pyrG gene was 
inserted from 3357 to 4949 as a marker for fungal transformation. There is a multiple 
cloning site (MCS) into which genes may be inserted. 

[34] Figure 12 is the pRAXdes2 vector backbone. This vector is based on the plasmid 
vector pRAX1. A Gateway cassette has been inserted into pRAX1 vector (indicated by the 
arrow on the interior of the circular plasmid). This cassette contains recombination sequence 
attR1 and attR2 and the selection marker catH and ccdB. The vector has been made 
according to the manual given in Gateway™ Cloning Technology: version 1 page 34-38 and 
can only replicate in E. coli DB3.1 from Invitrogen; in other E. coli hosts the ccdB gene is 
lethal. First a PCR fragment is made with primers containing attB1/2 recombination 
sequences. This fragment is recombined with pDONR201 (commercially available from 
Invitrogen); this vector contains attP1/2 recombination sequences with catH and ccdB in 
between the recombination sites. The BP clonase enzymes from Invitrogen are used to 
recombine the PCR fragment in this so-called ENTRY vector, clones with the PCR fragment 
inserted can be selected at 50pg/ml kanamycin because clones expressing ccdB do not 
survive. Now the att sequences are altered and called attL1 and attL2. The second step is to 
recombine this clone with the pRAXdes2 vector (containing attR1 and attR2 catH and ccdB 
in between the recombination sites). The LR clonase enzymes from Invitrogen are used to 
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recombine the insert from the ENTRY vector in the destination vector. Only pRAXCBHI 
vectors are selected using 100pg/ml ampicillin because ccdB is lethal and the ENTRY vector 
is sensitive to ampicillin. By this method the expression vector is now prepared and can be 
used to transform A niger. 

[35] Figure 13 provides an illustration of the pRAXdes2cbh1 vector which was used for 
expression of the nucleic acids encoding the CBH1 homologs or variants in Aspergillus. A 
nucleic acid encoding a CBH1 enzyme homolog or variant was cloned into the vector by 
homologous recombination of the att sequences. 

DETAILED DESCRIPTION 

[36] The invention will now be described in detail by way of reference only using the 
following definitions and examples. All patents and publications, including all sequences 
disclosed within such patents and publications, referred to herein are expressly incorporated 
by reference. 

[37] Unless defined otherwise herein, all technical and scientific terms used herein have 
the same meaning as commonly understood by one of ordinary skill in the art to which this 
invention belongs. Singleton, et al., Dictionary of Microbiology and Molecular 
Biology, 2d Ed., John Wiley and Sons, New York (1994), and Hale & Marham, The Harper 
Collins Dictionary of Biology, Harper Perennial, NY (1991) provide one of skill with a 
general dictionary of many of the terms used in this invention. Although any methods and 
materials similar or equivalent to those described herein can be used in the practice or 
testing of the present invention, the preferred methods and materials are described. 
Numeric ranges are inclusive of the numbers defining the range. Unless otherwise 
indicated, nucleic acids are written left to right in 5' to 3* orientation; amino acid sequences 
are written left to right in amino to carboxy orientation, respectively. Practitioners are 
particularly directed to Sambrook et al., 1989, and Ausubei FM et al., 1993, for definitions 
and terms of the art. It is to be understood that this invention is not limited to the particular 
methodology, protocols, and reagents described, as these may vary. 
[38] The headings provided herein are not limitations of the various aspects or 
embodiments of the invention which can be had by reference to the specification as a whole. 
Accordingly, the terms defined immediately below are more fully defined by reference to the 
specification as a whole. 

[39] All publications cited herein are expressly incorporated herein by reference for the 
purpose of describing and disclosing compositions and methodologies which might be used 
in connection with the invention. 
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I. DEFINITIONS 

[40] "Cellulase," "cellulolytic enzymes" or "cellulase enzymes" means bacterial, or fungal 
exoglucanases or exocellobiohydrolases, and/or endoglucanases, and/or p-glucosidases. 
These three different types of cellulase enzymes act synergistically to convert cellulose and 
its derivatives to glucose. 

[41] Many microbes make enzymes that hydrolyze cellulose, including the wood rotting 
fungus Trichoderma, the compost bacteria Thermomonospora, Bacillus, and Cellulomonas', 
Streptomyces] and the fungi Humicola, Aspergillus and Fusarium. The enzymes made by 
these microbes are mixtures of proteins with three types of actions useful in the conversion 
of cellulose to glucose: endoglucanases (EG), cellobiohydrolases (CBH), and beta- 
glucosidase. 

[42] A "desired cellulase" as used herein means any one of the following: 



a) 


a variant CBH1 from Hyprocrea jecorina according to the 




present invention; 


b) 


a CBH1 homolog from H. orientalis: 


c) 


a CBH1 homolog from H. schweinitzii, 


d) 


a CBH1 homolog from T. konilangbra; 


e) 


a CBH1 homolog from T. pseudokoningii and 


f) 


a polypeptide encoded by a nucleic acid that hybridizes with 




the nucleic acid that encodes any one of a-e under stringent 



conditions. 

[43] A "desired cellulase-encoding nucleic acid" as used herein means any one of the 
following: 

a) a nucleic acid encoding a variant CBH1 from Hyprocrea 
jecorina according to the present invention; 

b) a nucleic acid encoding a CBH1 homolog from H. orientalis 
having the sequence shown in Figure 3; 

c) a nucleic acid encoding a CBH1 homolog from H. schweinitzii 
having the sequence shown in Figure 5, 

d) a nucleic acid encoding a CBH1 homolog from T. konilangbra 
having the sequence shown in Figure 7; 

e) a nucleic acid encoding a CBH1 homolog from T. 
pseudokoningii having the sequence shown in Figure 9 and 
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f) a nucleic acid that hybridizes with any one of the nucleic acids 
provided for by a-e, above, under stringent conditions 
[44] "Variant" means a protein which is derived from a precursor protein (e.g., the native 
protein) by substitution of one or more amino acids at one or a number of different sites in 
the amino acid sequence. The preparation of an enzyme variant is preferably achieved by 
modifying a DNA sequence which encodes for the native protein, transformation of that DNA 
sequence into a suitable host, and expression of the modified DNA sequence to form the 
derivative enzyme or enzyme variant. The variant CBH1 enzyme of the invention includes 
peptides comprising altered amino acid sequences in comparison with a precursor enzyme 
amino acid sequence wherein the variant CBH enzyme retains the characteristic cellulolytic 
nature of the precursor enzyme but which may have altered properties in some specific 
aspect. For example, a variant CBH enzyme may have an increased pH optimum or 
increased temperature or oxidative stability but will retain its characteristic cellulolytic activity. 
[45] As used herein, the term "gene" means the segment of DNA involved in producing a 
polypeptide chain, that may or may not include regions preceding and following the coding 
region, e.g. 5' untranslated (5' UTR) or "leader sequences and 3' UTR or "trailer" 
sequences, as well as intervening sequences (introns) between individual coding segments 
(exons). 

[46] The "filamentous fungi" of the present invention are eukaryotic microorganisms and 
include all filamentous forms of the subdivision Eumycotina (see Alexopoulos, C. J. (1962), 
Introductory Mycology, New York: Wiley). These fungi are characterized by a vegetative 
mycelium with a cell wall composed of chitin, cellulose, and other complex polysaccharides. 
The filamentous fungi of the present invention are morphologically, physiologically, and 
genetically distinct from yeasts. Vegetative growth by filamentous fungi is by hyphal 
elongation and carbon catabolism is obligately aerobic. In contrast, vegetative growth by 
yeasts such as S. cerevisiae is by budding of a unicellular thallus, and carbon catabolism 
may be fermentative. S. cerevisiae has a prominent, very stable diploid phase, whereas 
diploids exist only briefly prior to meiosis in filamentous fungi, e.g., Aspergilli and 
Neurospora. Although pseudohyphal growth may be exhibited by yeast under certain 
conditions it is to be understood that this does not bring the yeast within the definition of 
filamentous fungi. S. cervisiae has 17 chromosomes as opposed to 8 and 7 for A. nidulans 
and N. crassa respectively. Further illustrations of differences between S. cerevisiae and 
filamentous fungi include the inability of S. cerevisiae to process Aspergillus and 



10 



ATTORNEY DOCKET NO. GC793-3 
PATENT APPLICATION 



Trichoderma introns and the inability to recognize many transcriptional regulators of 
filamentous fungi (Innis, M. A. et al. (1985) Science, 228, 21-26). 
[47] The term "heterologous" when used with reference to portions of a nucleic acid 
indicates that the nucleic acid comprises two or more subsequences that are not normally 
found in the same relationship to each other in nature. For instance, the nucleic acid is 
typically recombinantly produced, having two or more sequences, e.g., from unrelated genes 
arranged to make a new functional nucleic acid, e.g., a promoter from one source and a 
coding region from another source. Similarly, a heterologous protein will often refer to two or 
more subsequences that are not found in the same relationship to each other in nature (e.g., 
a fusion protein). 

[48] A "heterologous" nucleic acid construct or sequence has a portion of the sequence 
which is not native to the cell in which it is expressed. Heterologous, with respect to a 
control sequence refers to a control sequence (i.e. promoter or enhancer) that does not 
function in nature to regulate the same gene the expression of which it is currently 
regulating. Generally, heterologous nucleic acid sequences are not endogenous to the cell 
or part of the genome in which they are present, and have been added to the cell, by 
infection, transfection, transformation, microinjection, electroporation, or the like. A 
"heterologous" nucleic acid construct may contain a control sequence/DNA coding sequence 
combination that is the same as, or different from a control sequence/DNA coding sequence 
combination found in the native cell. 

[49] The terms "isolated" or "purified" as used herein refer to a nucleic acid or amino acid 
that is removed from at least one component with which it is naturally associated. 
[50] As used herein, the terms "promoter" refers to a nucleic acid sequence that functions 
to direct transcription of a downstream gene. The promoter will generally be appropriate to 
the host cell in which the target gene is being expressed. The promoter together with other 
transcriptional and translational regulatory nucleic acid sequences (also termed "control 
sequences") are necessary to express a given gene. In general, the transcriptional and 
translational regulatory sequences include, but are not limited to, promoter sequences, 
ribosomal binding sites, transcriptional start and stop sequences, translational start and stop 
sequences, and enhancer or activator sequences. 

[51] Generally, a "promoter sequence" is a DNA sequence which is recognized by the 
particular filamentous fungus for expression purposes. A "constitutive" promoter is a 
promoter that is active under most environmental and developmental conditions. An 
"inducible" promoter is a promoter that is active under environmental or developmental 
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regulation. An example of an inducible promoter useful in the present invention is the T. 
reesei (H.jecorina) cbhl promoter which is deposited in GenBank under Accession Number 
D86235. In another aspect the promoter is a cbh II or xylanase promoter from H. jecorina. 
[52] Exemplary promoters include the promoter from the A. awamori or A. niger 
glucoamylase genes (Nunberg, J. H. et al. (1984) Mol. Cell. Biol. 4, 2306-2315; Boel, E. et 
al. (1984) EMBO J. 3, 1581-1585), the Mucor miehei carboxyl protease gene, the Hypocrea 
jecorina cellobiohydrolase I gene (Shoemaker, S. P. et al. (1984) European Patent 
Application No. EPO0137280A1), the A. nidulans trpC gene (Yelton, M. et al. (1984) Proc. 
Natl. Acad. Sci. USA 81, 1470-1474; Mullaney, E. J. et al. (1985) Mol. Gen. Genet. 199, 37- 
45) the A. nidulans alcA gene (Lockington, R. A. et al. (1986) Gene 33, 137-149), the A. 
nidulans tpiA gene (McKnight, G. L. et al. (1986) Cell 46, 143-147), the A. nidulans amdS 
gene (Hynes, M. J. et al. (1983) Mol. Cell Biol. 3, 1430-1439), the H.jecorina xln1 gene, the 
H.jecorina cbh2 gene, the H.jecorina eg1 gene, the H.jecorina eg2 gene, the H.jecorina 
eg3 gene, and higher eukaryotic promoters such as the SV40 early promoter (Barclay, S. L. 
and E. Meller (1983) Molecular and Cellular Biology 3, 2117-2130). 

[53] A nucleic acid is "operably linked" when it is placed into a functional relationship with 
another nucleic acid sequence. For example, DNA encoding a secretory leader, i.e., a 
signal peptide, is operably linked to DNA for a polypeptide if it is expressed as a preprotein 
that participates in the secretion of the polypeptide; a promoter or enhancer is operably 
linked to a coding sequence if it affects the transcription of the sequence; or a ribosome 
binding site is operably linked to a coding sequence if it is positioned so as to facilitate 
translation. Generally, "operably linked" means that the DNA sequences being linked are 
contiguous, and, in the case of a secretory leader, contiguous and in reading phase. 
However, enhancers do not have to be contiguous. Linking is accomplished by ligation at 
convenient restriction sites. If such sites do not exist, the synthetic oligonucleotide adaptors 
or linkers are used in accordance with conventional practice. Thus, the term "operably 
linked" refers to a functional linkage between a nucleic acid expression control sequence 
(such as a promoter, or array of transcription factor binding sites) and a second nucleic acid 
sequence, wherein the expression control sequence directs transcription of the nucleic acid 
corresponding to the second sequence. 

[54] "Chimeric gene" or "heterologous nucleic acid construct", as defined herein refers to 
a non-native gene (i.e., one that has been introduced into a host) that may be composed of 
parts of different genes, including regulatory elements. A chimeric gene construct for 
transformation of a host cell is typically composed of a transcriptional regulatory region 
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(promoter) operably linked to a heterologous protein coding sequence, or, in a selectable 
marker chimeric gene, to a selectable marker gene encoding a protein conferring antibiotic 
resistance to transformed cells. A typical chimeric gene of the present invention, for 
transformation into a host cell, includes a transcriptional regulatory region that is constitutive 
or inducible, a protein coding sequence, and a terminator sequence. A chimeric gene 
construct may also include a second DNA sequence encoding a signal peptide if secretion of 
the target protein is desired. 

[55] The term "recombinant" when used with reference, e.g., to a cell, or nucleic acid, 
protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by 
the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic 
acid or protein, or that the cell is derived from a cell so modified. Thus, for example, 
recombinant cells express genes that are not found within the native (non-recombinant) form 
of the cell or express native genes that are otherwise abnormally expressed, under 
expressed or not expressed at all. 

[56] The term "secretory signal sequence" denotes a DNA sequence that encodes a 
polypeptide (a "secretory peptide") that, as a component of a larger polypeptide, directs the 
larger polypeptide through a secretory pathway of a cell in which it is synthesized. The larger 
peptide is commonly cleaved to remove the secretory peptide during transit through the 
secretory pathway. 

[57] As used herein, the phrases "whole cellulase preparation" and "whole cellulase 
composition" are used interchangeably and refer to both naturally occurring and non- 
naturally occurring compositions. A "naturally occurring" composition is one produced by a 
naturally occurring source and which comprises one or more cellobiohydrolase-type, one or 
more endoglucanase-type, and one or more P-glucosidase components wherein each of 
these components is found at the ratio produced by the source. A naturally occurring 
composition is one that is produced by an organism unmodified with respect to the 
cellulolytic enzymes such that the ratio of the component enzymes is unaltered from that 
produced by the native organism. 

[58] A "non-naturally occurring" composition encompasses those compositions produced 
by: (1) combining component cellulolytic enzymes either in a naturally occurring ratio or non- 
naturally occurring, i.e., altered, ratio; or (2) modifying an organism to overexpress or 
underexpress one or more cellulolytic enzyme; or (3) modifying an organism such that at 
least one cellulolytic enzyme is deleted. 



13 



ATTORNEY DOCKET NO. GC793-3 
PATENT APPLICATION 



[59] "Equivalent residues" may also be defined by determining homology at the level of 
tertiary structure for a precursor cellulase whose tertiary structure has been determined by x- 
ray crystallography. Equivalent residues are defined as those for which the atomic 
coordinates of two or more of the main chain atoms of a particular amino acid residue of a 
cellulase and Hypocrea jecorina CBH (N on N, CA on CA, C on C and O on O) are within 
0.1 3nm and preferably 0.1 nm after alignment. Alignment is achieved after the best model 
has been oriented and positioned to give the maximum overlap of atomic coordinates of non- 
hydrogen protein atoms of the cellulase in question to the H. jecorina CBH1 . The best 
model is the crystallographic model giving the lowest R factor for experimental diffraction 
data at the highest resolution available. 

X h \Fo(h)\-\Fc(h)\ 
RfaCt ° r = * h \Fo(h)\ 



[60] Equivalent residues which are functionally analogous to a specific residue of H. 
jecorina CBH1 are defined as those amino acids of a cellulase which may adopt a 
conformation such that they either alter, modify or contribute to protein structure, substrate 
binding or catalysis in a manner defined and attributed to a specific residue of the H. jecorina 
CBH1. Further, they are those residues of the cellulase (for which a tertiary structure has 
been obtained by x-ray crystallography) which occupy an analogous position to the extent 
that, although the main chain atoms of the given residue may not satisfy the criteria of 
equivalence on the basis of occupying a homologous position, the atomic coordinates of at 
least two of the side chain atoms of the residue lie with 0.1 3nm of the corresponding side 
chain atoms of H. jecorina CBH. 

[61] The term "nucleic acid molecule" includes RNA, DNA and cDNA molecules. It will be 
understood that, as a result of the degeneracy of the genetic code, a multitude of nucleotide 
sequences encoding a given protein such as CBH1 may be produced. The present 
invention contemplates every possible variant nucleotide sequence, encoding CBH1, all of 
which are possible given the degeneracy of the genetic code. 

[62] As used herein, the term "vector" refers to a nucleic acid construct designed for 
transfer between different host cells. An "expression vector" refers to a vector that has the 
ability to incorporate and express heterologous DNA fragments in a foreign cell. Many 
prokaryotic and eukaryotic expression vectors are commercially available. Selection of 
appropriate expression vectors is within the knowledge of those having skill in the art. 
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[63] Accordingly, an "expression cassette" or "expression vector" is a nucleic acid 
construct generated recombinantly or synthetically, with a series of specified nucleic acid 
elements that permit transcription of a particular nucleic acid in a target cell. The 
recombinant expression cassette can be incorporated into a plasmid, chromosome, 
mitochondrial DNA, plastid DNA, virus, or nucleic acid fragment. Typically, the recombinant 
expression cassette portion of an expression vector includes, among other sequences, a 
nucleic acid sequence to be transcribed and a promoter. 

[64] As used herein, the term "plasmid" refers to a circular double-stranded (ds) DNA 
construct used as a cloning vector, and which forms an extrachromosomal self-replicating 
genetic element in many bacteria and some eukaryotes. 

[65] As used herein, the term "selectable marker-encoding nucleotide sequence" refers to 
a nucleotide sequence which is capable of expression in cells and where expression of the 
selectable marker confers to cells containing the expressed gene the ability to grow in the 
presence of a corresponding selective agent, or under corresponding selective growth 
conditions. 

[66] In general, nucleic acid molecules which encode the variant CBH1 will hybridize, 
under moderate to high stringency conditions to the wild type sequence provided herein as 

SEQ ID NO: (native H.yecor/naCBHI). However, in some cases a CBH1 -encoding 

nucleotide sequence is employed that possesses a substantially different codon usage, 
while the protein encoded by the CBH1-encoding nucleotide sequence has the same or 
substantially the same amino acid sequence as the native protein. For example, the coding 
sequence may be modified to facilitate faster expression of CBH1 in a particular prokaryotic 
or eukaryotic expression system, in accordance with the frequency with which a particular 
codon is utilized by the host. Te'o, et a/. (2000), for example, describes the optimization of 
genes for expression in filamentous fungi. 

[67] A nucleic acid sequence is considered to be "selectively hybridizable" to a reference 
nucleic acid sequence if the two sequences specifically hybridize to one another under 
moderate to high stringency hybridization and wash conditions. Hybridization conditions are 
based on the melting temperature (Tm) of the nucleic acid binding complex or probe. For 
example, "maximum stringency" typically occurs at about Tm-5°C (5° below the Tm of the 
probe); "high stringency" at about 5-10° below the Tm; "moderate " or "intermediate 
stringency" at about 10-20° below the Tm of the probe; and "low stringency" at about 20-25° 
below the Tm. Functionally, maximum stringency conditions may be used to identify 
sequences having strict identity or near-strict identity with the hybridization probe; while high 
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stringency conditions are used to identify sequences having about 80% or more sequence 
identity with the probe. 

[68] Moderate and high stringency hybridization conditions are well known in the art (see, 
for example, Sambrook, et al, 1989, Chapters 9 and 11, and in Ausubel, F.M., et a/., 1993, 
expressly incorporated by reference herein). An example of high stringency conditions 
includes hybridization at about 42°C in 50% formamide, 5X SSC, 5X Denhardt's solution, 
0.5% SDS and 100 \ig/m\ denatured carrier DNA followed by washing two times in 2X SSC 
and 0.5% SDS at room temperature and two additional times in 0.1X SSC and 0.5% SDS at 
42°C. 

[69] As used herein, the terms "transformed", "stably transformed" or "transgenic" with 
reference to a cell means the cell has a non-native (heterologous) nucleic acid sequence 
integrated into its genome or as an episomal plasmid that is maintained through multiple 
generations. 

[70] As used herein, the term "expression" refers to the process by which a polypeptide is 
produced based on the nucleic acid sequence of a gene. The process includes both 
transcription and translation. 

[71] The term "introduced" in the context of inserting a nucleic acid sequence into a cell, 
means "transfection", or "transformation" or "transduction" and includes reference to the 
incorporation of a nucleic acid sequence into a eukaryotic or prokaryotic cell where the 
nucleic acid sequence may be incorporated into the genome of the cell (for example, 
chromosome, plasmid, plastid, or mitochondrial DNA), converted into an autonomous 
replicon, or transiently expressed (for example, transfected mRNA). 
[72] It follows that the term "desired cellulase expression" refers to transcription and 
translation of the desired cellulase gene, the products of which include precursor RNA, 
mRNA, polypeptide, post-translationally processed polypeptides. By way of example, 
assays for CBH1 expression include Western blot for CBH1 protein, Northern blot analysis 
and reverse transcriptase polymerase chain reaction (RT-PCR) assays for CBH1 mRNA, 
and endoglucanase activity assays as described in Shoemaker S.P. and Brown R.D.Jr. 
(Biochim. Biophys. Acta, 1978, 523:133-146) and Schulein (1988). 
[73] The term "alternative splicing" refers to the process whereby multiple polypeptide 
isoforms are generated from a single gene, and involves the splicing together of 
nonconsecutive exons during the processing of some, but not all, transcripts of the gene. 
Thus a particular exon may be connected to any one of several alternative exons to form 
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messenger RNAs. The alternatively-spliced mRNAs produce polypeptides ("splice variants") 
in which some parts are common while other parts are different. 
[74] By the term "host-cell" is meant a cell that contains a vector and supports the 
replication, and/or transcription or transcription and translation (expression) of the 
expression construct. Host cells for use in the present invention can be prokaryotic cells, 
such as E. co//, or eukaryotic cells such as yeast, plant, insect, amphibian, or mammalian 
cells. In general, host cells are filamentous fungi. 

[75] The term "cellulase" refers to a category of enzymes capable of hydrolyzing cellulose 
polymers to shorter cello-oligosaccharide oligomers, cellobiose and/or glucose. Numerous 
examples of cellulases, such as exoglucanases, exocellobiohydrolases, endoglucanases, 
and glucosidases have been obtained from cellulolytic organisms, particularly including 
fungi, plants and bacteria. 

[76] CBH1 from Hypocrea jecorina is a member of the Glycosyl Hydrolase Family 7 
(hence Cel7) and, specifically, was the first member of that family identified in Hypocrea 
jecorina (hence Ce!7A). The Glycosyl Hydrolase Family 7 contains both Endoglucanases 
and Cellobiohydrolases/exoglucanases, and that CBH1 is the latter. Thus, the phrases 
CBH1, CBH1-type protein and Cel7 cellobiohydrolases may be used interchangeably herein. 
[77] The term "cellulose binding domain" as used herein refers to portion of the amino 
acid sequence of a cellulase or a region of the enzyme that is involved in the cellulose 
binding activity of a cellulase or derivative thereof. Cellulose binding domains or modules 
generally function by non-covalently binding the cellulase to cellulose, a cellulose derivative 
or other polysaccharide equivalent thereof. Cellulose binding domains permit or facilitate 
hydrolysis of cellulose fibers by the structurally distinct catalytic core region, and typically 
function independent of the catalytic core. Thus, a cellulose binding domain will not possess 
the significant hydrolytic activity attributable to a catalytic core. In other words, a cellulose 
binding domain is a structural element of the cellulase enzyme protein tertiary structure that 
is distinct from the structural element which possesses catalytic activity. 
[78] As used herein, the term "surfactant" refers to any compound generally recognized in 
the art as having surface active qualities. Thus, for example, surfactants comprise anionic, 
cationic and nonionic surfactants such as those commonly found in detergents. Anionic 
surfactants include linear or branched alkylbenzenesulfonates; alkyl or alkenyl ether sulfates 
having linear or branched alkyl groups or alkenyl groups; alkyl or alkenyl sulfates; 
olefinsulfonates; and alkanesulfonates. Ampholytic surfactants include quaternary 
ammonium salt sulfonates, and betaine-type ampholytic surfactants. Such ampholytic 
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surfactants have both the positive and negative charged groups in the same molecule. 
Nonionic surfactants may comprise polyoxyalkylene ethers, as well as higher fatty acid 
alkanolamides or alkylene oxide adduct thereof, fatty acid glycerine monoesters, and the 
like. 

[79] As used herein, the term "cellulose containing fabric" refers to any sewn or unsewn 
fabrics, yarns or fibers made of cotton or non-cotton containing cellulose or cotton or non- 
cotton containing cellulose blends including natural cellulosics and manmade cellulosics 
(such as jute, flax, ramie, rayon, and lyocell). 

[80] As used herein, the term "cotton-containing fabric" refers to sewn or unsewn fabrics, 
yarns or fibers made of pure cotton or cotton blends including cotton woven fabrics, cotton 
knits, cotton denims, cotton yams, raw cotton and the like. 

[81] As used herein, the term "stonewashing composition" refers to a formulation for use 
in stonewashing cellulose containing fabrics. Stonewashing compositions are used to 
modify cellulose containing fabrics prior to sale, i.e., during the manufacturing process. In 
contrast, detergent compositions are intended for the cleaning of soiled garments and are 
not used during the manufacturing process. 

[82] As used herein, the term "detergent composition" refers to a mixture which is 
intended for use in a wash medium for the laundering of soiled cellulose containing fabrics. 
In the context of the present invention, such compositions may include, in addition to 
cellulases and surfactants, additional hydrolytic enzymes, builders, bleaching agents, bleach 
activators, bluing agents and fluorescent dyes, caking inhibitors, masking agents, cellulase 
activators, antioxidants, and solubilizers. 

[83] As used herein, the term "decrease or elimination in expression of the cbhl gene" 
means that either that the cbhl gene has been deleted from the genome and therefore 
cannot be expressed by the recombinant host microorganism; or that the cbhl gene has 
been modified such that a functional CBH1 enzyme is not produced by the host 
microorganism. 

[84] The term "variant cbhl gene" or "variant CBH1" means, respectively, that the nucleic 
acid sequence of the cbhl gene from H.jecorina has been altered by removing, adding, 
and/or manipulating the coding sequence or the amino acid sequence of the expressed 
protein has been modified consistent with the invention described herein. 
[85] As used herein, the terms "active" and "biologically active" refer to a biological activity 
associated with a particular protein and are used interchangeably herein. For example, the 
enzymatic activity associated with a protease is proteolysis and, thus, an active protease has 
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proteolytic activity. It follows that the biological activity of a given protein refers to any 
biological activity typically attributed to that protein by those of skill in the art. 
[86] When employed in enzymatic solutions, the homolog or variant CBH1 component is 
generally added in an amount sufficient to allow the highest rate of release of soluble sugars 
from the biomass. The amount of homolog or variant CBH1 component added depends 
upon the type of biomass to be saccharified which can be readily determined by the skilled 
artisan. However, when employed, the weight percent of the homolog or variant CBH1 
component relative to any EG type components present in the cellulase composition is from 
preferably about 1 , preferably about 5, preferably about 10, preferably about 15, or 
preferably about 20 weight percent to preferably about 25, preferably about 30, preferably 
about 35, preferably about 40, preferably about 45 or preferably about 50 weight percent. 
Furthermore, preferred ranges may be about 0.5 to about 15 weight percent, about 0.5 to 
about 20 weight percent, from about 1 to about 10 weight percent, from about 1 to about 15 
weight percent, from about 1 to about 20 weight percent, from about 1 to about 25 weight 
percent, from about 5 to about 20 weight percent, from about 5 to about 25 weight percent, 
from about 5 to about 30 weight percent, from about 5 to about 35 weight percent, from 
about 5 to about 40 weight percent, from about 5 to about 45 weight percent, from about 5 to 
about 50 weight percent, from about 10 to about 20 weight percent, from about 10 to about 
25 weight percent, from about 10 to about 30 weight percent, from about 10 to about 35 
weight percent, from about 10 to about 40 weight percent, from about 10 to about 45 weight 
percent, from about 10 to about 50 weight percent, from about 15 to about 20 weight 
percent, from about 15 to about 25 weight percent, from about 15 to about 30 weight 
percent, from about 15 to about 35 weight percent, from about 15 to about 30 weight 
percent, from about 15 to about 45 weight percent, from about 15 to about 50 weight 
percent. 

II. HOST ORGANISMS 

[87] Filamentous fungi include all filamentous forms of the subdivision Eumycota and 
Oomycota. The filamentous fungi are characterized by vegetative mycelium having a cell 
wall composed of chitin, glucan, chitosan, mannan, and other complex polysaccharides, with 
vegetative growth by hyphal elongation and carbon catabolism that is obligately aerobic. 
[88] In the present invention, the filamentous fungal parent cell may be a cell of a species 
of, but not limited to, Trichoderma, e.g., Trichoderma longibrachiatum, Trichoderma viride, 
Trichoderma koningii, Trichoderma harzianum; Penicillium sp.; Humicola sp., including 
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Humicola insolens and Humicola grisea; Chrysosporium sp., including C. lucknowense; 
Gliocladium sp.; Aspergillus sp.; Fusarium sp., Neurospora sp., Hypocrea sp., and 
Emericella sp. As used herein, the term "Trichoderma" or "Trichoderma sp." refers to any 
fungal strains which have previously been classified as Trichoderma or are currently 
classified as Trichoderma. 

[89] In one preferred embodiment, the filamentous fungal parent cell is an Aspergillus 
n/ger, Aspergillus awamori, Aspergillus aculeatus t or Aspergillus nidulans cell. 
[90] In another preferred embodiment, the filamentous fungal parent cell is a Trichoderma 
reesei cell. 

III. CELLULASES 

[91] Cellulases are known in the art as enzymes that hydrolyze cellulose (beta-1,4-glucan 
or beta D-glucosidic linkages) resulting in the formation of glucose, cellobiose, 
cellooligosaccharides, and the like. As set forth above, cellulases have been traditionally 
divided into three major classes: endoglucanases (EC 3.2.1.4) ("EG"), exoglucanases or 
cellobiohydrolases (EC 3.2.1.91) ("CBH") and beta-glucosidases (EC 3.2.1.21) ("BG"). 
(Knowles, et a/., 1987; Schulein, 1988). 

[92] Certain fungi produce complete cellulase systems which include exo- 
cellobiohydrolases or CBH-type cellulases, endoglucanases or EG-type cellulases and beta- 
glucosidases or BG-type cellulases (Schulein, 1988). However, sometimes these systems 
lack CBH-type cellulases and bacterial cellulases also typically include little or no CBH-type 
cellulases. In addition, it has been shown that the EG components and CBH components 
synergistically interact to more efficiently degrade cellulose. See, e.g., Wood, 1985. The 
different components, i.e., the various endoglucanases and exocellobiohydrolases in a multi- 
component or complete cellulase system, generally have different properties, such as 
isoelectric point, molecular weight, degree of glycosylation, substrate specificity and 
enzymatic action patterns. 

[93] Cellulase compositions have also been shown to degrade cotton-containing fabrics, 
resulting in reduced strength loss in the fabric (U.S. Patent No. 4,822,516), contributing to 
reluctance to use cellulase compositions in commercial detergent applications. Cellulase 
compositions comprising endoglucanase components have been suggested to exhibit 
reduced strength loss for cotton-containing fabrics as compared to compositions comprising 
a complete cellulase system. 
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[94] Cellulases have also been shown to be useful in degradation of cellulase biomass to 
ethanol (wherein the cellulase degrades cellulose to glucose and yeast or other microbes 
further ferment the glucose into ethanol), in the treatment of mechanical pulp (Pere et a/., 
1996), for use as a feed additive (WO 91/04673) and in grain wet milling. 
[95] Most CBHs and EGs have a multidomain structure consisting of a core domain 
separated from a cellulose binding domain (CBD) by a linker peptide (Suurnakki et al., 
2000). The core domain contains the active site whereas the CBD interacts with cellulose by 
binding the enzyme to it (van Tilbeurgh et a/., 1986; Tomme et a/., 1988). The CBDs are 
particularly important in the hydrolysis of crystalline cellulose. It has been shown that the 
ability of cellobiohydrolases to degrade crystalline cellulose clearly decreases when the CBD 
is absent (Linder and Teeri, 1997). However, the exact role and action mechanism of CBDs 
is still a matter of speculation. It has been suggested that the CBD enhances the enzymatic 
activity merely by increasing the effective enzyme concentration at the surface of cellulose 
(Stahlberg et a/., 1991), and/or by loosening single cellulose chains from the cellulose 
surface (Tormo et a/., 1996). Most studies concerning the effects of cellulase domains on 
different substrates have been carried out with core proteins of cellobiohydrolases, as their 
core proteins can easily be produced by limited proteolysis with papain (Tomme et al., 
1988). Numerous cellulases have been described in the scientific literature, examples of 
which include: from Trichoderma reeseh Shoemaker, S. et al., Bio/Technology, 1 :691-696, 
1983, which discloses CBHI; Teeri, T. etal., Gene, 51:43-52, 1987, which discloses CBHII. 
Cellulases from species other than Trichoderma have also been described e.g., Ooi etal., 
1990, which discloses the cDNA sequence coding for endoglucanase F1-CMC produced by 
Aspergillus aculeatus; Kawaguchi T etal., 1996, which discloses the cloning and sequencing 
of the cDNA encoding beta-glucosidase 1 from Aspergillus aculeatus; Sakamoto et al., 1995, 
which discloses the cDNA sequence encoding the endoglucanase CMCase-1 from 
Aspergillus kawachii IFO 4308; Saarilahti et al., 1 990 which discloses an endoglucanase 
from Erwinia carotovara\ Spilliaert R, et al. t 1994, which discloses the cloning and 
sequencing of bgIA, coding for a thermostable beta-glucanase from Rhodothermus marinus; 
and Halldorsdottir S et al., 1998, which discloses the cloning, sequencing and 
overexpression of a Rhodothermus marinus gene encoding a thermostable cellulase of 
glycosyl hydrolase family 12. However, there remains a need for identification and 
characterization of novel cellulases, with improved properties, such as improved 
performance under conditions of thermal stress or in the presence of surfactants, increased 
specific activity, altered substrate cleavage pattern, and/or high level expression in vitro. 
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[96] The development of new and improved cellulase compositions that comprise varying 
amounts CBH-type cellulase is of interest for use: (1) in detergent compositions that exhibit 
enhanced cleaning ability, function as a softening agent and/or improve the feel of cotton 
fabrics (e.g., "stone washing" or "biopolishing"); (2) in compositions for degrading wood pulp 
or other biomass into sugars (e.g., for bio-ethanol production); and/or (3) in feed 
compositions. 

IV. MOLECULAR BIOLOGY 

[97] In one embodiment this invention provides for the expression of desired cellulase 
genes under control of a promoter functional in a filamentous fungus. Therefore, this 
invention relies on routine techniques in the field of recombinant genetics. Basic texts 
disclosing the general methods of use in this invention include Sambrook et a/., Molecular 
Cloning, A Laboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer and Expression: A 
Laboratory Manual (1990); and Ausubel et ai, eds., Current Protocols in Molecular Biology 
(1994)). 

A. Methods for Identifying Homologous CBH1 Genes 

[98] The nucleic acid sequence for the wild type H.jecorina CBH1 is shown in Figure 1 . 
The invention, in one aspect, encompasses a nucleic acid molecule encoding a CBH1 
homolog described herein. The nucleic acid may be a DNA molecule. 
[99] Techniques that can be used to isolate homologous CBH1 -encoding DNA sequences 
are well known in the art and include, but are not limited to, cDNA and/or genomic library 
screening with a homologous DNA probes and expression screening with activity assays or 
antibodies against CBH1. Any of these methods can be found in Sambrook, et al. or in 
Current Protocols In Molecular Biology, F. Ausubel, et al., ed. Greene Publishing and 
Wiley-lnterscience, New York (1987) ("Ausubel"). 

B. Methods of Mutating CBH Nucleic Acid Sequences 

[100] Any method known in the art that can introduce mutations is contemplated by the 
present invention. 

[101] The present invention relates to the expression, purification and/or isolation and use 
of variant CBH1. These enzymes are preferably prepared by recombinant methods utilizing 
the cbh gene from H. jecorina. 

[102] After the isolation and cloning of the cbhl gene from H.jecorina, other methods 
known in the art, such as site directed mutagenesis, are used to make the substitutions, 
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additions or deletions that correspond to substituted amino acids in the expressed CBH1 
variant. Again, site directed mutagenesis and other methods of incorporating amino acid 
changes in expressed proteins at the DNA level can be found in Sambrook, et al. and 
Ausubel, et al. 

[103] DNA encoding an amino acid sequence variant of the H. jecorina CBH 1 is prepared 
by a variety of methods known in the art. These methods include, but are not limited to, 
preparation by site-directed (or oligonucleotide-mediated) mutagenesis, PCR mutagenesis, 
and cassette mutagenesis of an earlier prepared DNA encoding the H. jecorina CBH1. 
[104] Site-directed mutagenesis is a preferred method for preparing substitution variants. 
This technique is well known in the art (see, e.g. .Carter et al. Nucleic Acids Res. 13:4431- 
4443 (1985) and Kunkel et al., Proc. Natl. Acad.Sci.USA 82:488 (1987)). Briefly, in carrying 
out site-directed mutagenesis of DNA, the starting DNA is altered by first hybridizing an 
oligonucleotide encoding the desired mutation to a single strand of such starting DNA. After 
hybridization, a DNA polymerase is used to synthesize an entire second strand, using the 
hybridized oligonucleotide as a primer, and using the single strand of the starting DNA as a 
template. Thus, the oligonucleotide encoding the desired mutation is incorporated in the 
resulting double-stranded DNA. 

[105] PCR mutagenesis is also suitable for making amino acid sequence variants of the 
starting polypeptide, i.e., H. jecorina CBH1. See Higuchi, in PCR Protocols, pp.1 77-1 83 
(Academic Press, 1990); and Vallette et al., Nuc. Acids Res. 17:723-733 (1989). Briefly, 
when small amounts of template DNA are used as starting material in a PCR, primers that 
differ slightly in sequence from the corresponding region in a template DNA can be used to 
generate relatively large quantities of a specific DNA fragment that differs from the template 
sequence only at the positions where the primers differ from the template. 
[106] Another method for preparing variants, cassette mutagenesis, is based on the 
technique described by Wells et al., Gene 34:315-323 (1985). The starting material is the 
plasmid (or other vector) comprising the starting polypeptide DNA to be mutated. The 
codon(s) in the starting DNA to be mutated are identified. There must be a unique restriction 
endonuclease site on each side of the identified mutation site(s). If no such restriction sites 
exist, they may be generated using the above-described oligonucleotide-mediated 
mutagenesis method to introduce them at appropriate locations in the starting polypeptide 
DNA. The plasmid DNA is cut at these sites to linearize it. A double-stranded 
oligonucleotide encoding the sequence of the DNA between the restriction sites but 
containing the desired mutation(s) is synthesized using standard procedures, wherein the 
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two strands of the oligonucleotide are synthesized separately and then hybridized together 
using standard techniques. This double-stranded oligonucleotide is referred to as the 
cassette. This cassette is designed to have 5' and 3* ends that are compatible with the ends 
of the linearized plasmid, such that it can be directly ligated to the plasmid. This plasmid 
now contains the mutated DNA sequence. 

[107] Alternatively, or additionally, the desired amino acid sequence encoding a desired 
cellulase can be determined, and a nucleic acid sequence encoding such amino acid 
sequence variant can be generated synthetically. 

[108] The desired cellulase(s) so prepared may be subjected to further modifications, 
oftentimes depending on the intended use of the cellulase. Such modifications may involve 
further alteration of the amino acid sequence, fusion to heterologous polypeptide(s) and/or 
covalent modifications. 

V. cbhl Nucleic Acids And CBH1 Polypeptides. 

A. Variant cbh-type Nucleic acids 
[109] After DNA sequences that encode the CBH1 variants have been cloned into DNA 
constructs, the DNA is used to transform microorganisms. The microorganism to be 
transformed for the purpose of expressing a variant CBH1 according to the present invention 
may advantageously comprise a strain derived from Trichoderma sp. Thus, a preferred 
mode for preparing variant CBH1 cellulases according to the present invention comprises 
transforming a Trichoderma sp. host cell with a DNA construct comprising at least a 
fragment of DNA encoding a portion or all of the variant CBH1 . The DNA construct will 
generally be functionally attached to a promoter. The transformed host cell is then grown 
under conditions so as to express the desired protein. Subsequently, the desired protein 
product is purified to substantial homogeneity. 

[110] However, it may in fact be that the best expression vehicle for a given DNA encoding 
a variant CBH1 may differ from H. jecorina. Thus, it may be that it will be most 
advantageous to express a protein in a transformation host that bears phylogenetic similarity 
to the source organism for the variant CBH1 . In an alternative embodiment, Aspergillus 
niger can be used as an expression vehicle. For a description of transformation techniques 
with A. niger, see WO 98/31821, the disclosure of which is incorporated by reference in its 
entirety. 

[Ill] Accordingly, the present description of a Trichoderma spp. expression system is 
provided for illustrative purposes only and as one option for expressing the variant CBH1 of 
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the invention. One of skill in the art, however, may be inclined to express the DNA encoding 
variant CBH1 in a different host cell if appropriate and it should be understood that the 
source of the variant CBH1 should be considered in determining the optimal expression 
host. Additionally, the skilled worker in the field will be capable of selecting the best 
expression system for a particular gene through routine techniques utilizing the tools 
available in the art. 

B. Variant CBH1 Polypeptides 
[112] The amino acid sequence for the wild type H.jecorina CBH1 is shown in Figure 1 . 
The variant CBH1 polypeptides comprise a substitution or deletion at a position 
corresponding to one or more of residues L6, S8, P13, Q17, G22, T24, Q27, T41, S47, N49, 
T59, T66, A68, C71, A77, G88, N89, A100, N103, A112, S113, L125, T160, Y171, Q186, 
E193, S195, C210, M213, L225, T226, P227, T232, E236, E239, G242, T246, D249, N250, 
R251, Y252, D257, D259, S278, T281, L288, E295, T296, S297, A299, N301, F311, L318, 
E325, N327, D329, T332, A336, S341, S342, F352, K354, T356, G359, D368, Y371, N373, 
T380, Y381, N384, V393, R394, V407, P412, T417, F418 G430, N436, G440, P443, T445, 
Y466, T478, A481 and/or N490 in CBH1 from Hypocrea jecorina. 
[113] The variant CBHI's of this invention have amino acid sequences that are derived 
from the amino acid sequence of a precursor H.jecorina CBH1. The amino acid sequence 
of the CBH1 variant differs from the precursor CBH1 amino acid sequence by the 
substitution, deletion or insertion of one or more amino acids of the precursor amino acid 
sequence. The mature amino acid sequence of H.jecorina CBH1 is shown in Figure 1. 
Thus, this invention is directed to CBH1 variants which contain amino acid residues at 
positions which are equivalent to the particular identified residue in H.jecorina CBH1. A 
residue (amino acid) of an CBH1 variant is equivalent to a residue of Hypocrea jecorina 
CBH1 if it is either homologous (i.e., corresponding in position in either primary or tertiary 
structure) or is functionally analogous to a specific residue or portion of that residue in 
Hypocrea jecorina CBH1 (i.e., having the same or similar functional capacity to combine, 
react, or interact chemically or structurally). As used herein, numbering is intended to 
correspond to that of the mature CBH1 amino acid sequence as illustrated in Figure 1 . In 
addition to locations within the precursor CBH1, specific residues in the precursor CBH1 
corresponding to the amino acid positions that are responsible for instability when the 
precursor CBH1 is under thermal stress are identified herein for substitution or deletion. The 
amino acid position number (e.g., +51) refers to the number assigned to the mature 
Hypocrea jecorina CBH1 sequence presented in Figure 1. 
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[114] Alignment of amino acid sequences to determine homology is preferably determined 
by using a "sequence comparison algorithm." Optimal alignment of sequences for 
comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, 
Adv. Appl. Math. 2:482 (1981 ), by the homology alignment algorithm of Needleman & 
Wunsch, J. Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson & 
Lipman, Proc. Nat'lAcad. Sci. USA 85:2444 (1988), by computerized implementations of 
these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software 
Package, Genetics Computer Group, 575 Science Dr., Madison, Wl), by visual inspection or 
MOE by Chemical Computing Group, Montreal Canada. 

[115] An example of an algorithm that is suitable for determining sequence similarity is the 
BLAST algorithm, which is described in Altschul, et a/., J. Mol. Biol. 215:403-410 (1990). 
Software for performing BLAST analyses is publicly available through the National Center for 
Biotechnology Information (<www.ncbi.nlm.nih.gov>). This algorithm involves first identifying 
high scoring sequence pairs (HSPs) by identifying short words of length W in the query 
sequence that either match or satisfy some positive-valued threshold score T when aligned 
with a word of the same length in a database sequence. These initial neighborhood word 
hits act as starting points to find longer HSPs containing them. The word hits are expanded 
in both directions along each of the two sequences being compared for as far as the 
cumulative alignment score can be increased. Extension of the word hits is stopped when: 
the cumulative alignment score falls off by the quantity X from a maximum achieved value; 
the cumulative score goes to zero or below; or the end of either sequence is reached. The 
BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the 
alignment. The BLAST program uses as defaults a word length (W) of 1 1 , the BLOSUM62 
scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)) 
alignments (B) of 50, expectation (E) of 10, M'5, N'-4, and a comparison of both strands. 
[116] The BLAST algorithm then performs a statistical analysis of the similarity between 
two sequences (see, e.g., Kariin & Altschul, Proc. Natl Acad. Sci. USA 90:5873-5787 
(1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum 
probability (P(N)), which provides an indication of the probability by which a match between 
two nucleotide or amino acid sequences would occur by chance. For example, an amino 
acid sequence is considered similar to a protease if the smallest sum probability in a 
comparison of the test amino acid sequence to a protease amino acid sequence is less than 
about 0.1 , more preferably less than about 0.01 , and most preferably less than about 0.001 . 
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[117] Additional specific strategies for modifying stability of CBH1 cellulases are provided 
below: 

[118] (1 ) Decreasing the entropy of main-chain unfolding may introduce stability to the 
enzyme. For example, the introduction of proline residues may significantly stabilize the 
protein by decreasing the entropy of the unfolding (see, e.g., Watanabe, et al., Eur. J. 
Biochem. 226:277-283 (1994)). Similarly, glycine residues have no p-carbon, and thus have 
considerably greater backbone conformational freedom than many other residues. 
Replacement of glycines, preferably with alanines, may reduce the entropy of unfolding and 
improve stability (see, e.g., Matthews, et a/., Proc. Natl. Acad. ScL USA 84; 6663-6667 

(1987) ). Additionally, by shortening external loops it may be possible to improve stability. It 
has been observed that hyperthermophile produced proteins have shorter external loops 
than their mesophilic homologues (see, e.g., Russel, et al., Current Opinions in 
Biotechnology 6:370-374 (1995)). The introduction of disulfide bonds may also be effective 
to stabilize distinct tertiary structures in relation to each other. Thus, the introduction of 
cysteines at residues accessible to existing cysteines or the introduction of pairs of cysteines 
that could form disulfide bonds would alter the stability of a CBH1 variant. 

[119] (2) Decreasing internal cavities by increasing side-chain hydrophobicity may alter 
the stability of an enzyme. Reducing the number and volume of internal cavities increases 
the stability of enzyme by maximizing hydrophobic interactions and reducing packing defects 
(see, e.g., Matthews, Ann. Rev. Biochem. 62:139-160 (1993); Burley, et al., Science 229:23- 
29 (1985); Zuber, Biophys. Chem. 29:171-179 (1988); Kellis, etal., Nature 333:784-786 

(1988) ). It is known that multimeric proteins from thermophiles often have more hydrophobic 
sub-unit interfaces with greater surface complementarity than their mesophilic counterparts 
(Russel, et al., supra). This principle is believed to be applicable to domain interfaces of 
monomeric proteins. Specific substitutions that may improve stability by increasing 
hydrophobicity include lysine to arginine, serine to alanine and threonine to alanine (Russel, 
et al., supra). Modification by substitution to alanine or proline may increase side-chain size 
with resultant reduction in cavities, better packing and increased hydrophobicity. 
Substitutions to reduce the size of the cavity, increase hydrophobicity and improve the 
complementarity the interfaces between the domains of CBH1 may improve stability of the 
enzyme. Specifically, modification of the specific residue at these positions with a different 
residue selected from any of phenylalanine, tryptophan, tyrosine, leucine and isoleucine may 
improve performance. 
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[120] (3) Balancing charge in rigid secondary structure, i.e., a-helices and p-turns may 
improve stability. For example, neutralizing partial positive charges on a helix N-terminus 
with negative charge on aspartic acid may improve stability of the structure (see, e.g., 
Eriksson, etal., Science 255:178-183 (1992)). Similarly, neutralizing partial negative 
charges on helix C-terminus with positive charge may improve stability. Removing positive 
charge from interacting with peptide N-terminus in p-turns should be effective in conferring 
tertiary structure stability. Substitution with a non-positively charged residue could remove 
an unfavorable positive charge from interacting with an amide nitrogen present in a turn. 
[121] (4) Introducing salt bridges and hydrogen bonds to stabilize tertiary structures 
may be effective. For example, ion pair interactions, e.g., between aspartic acid or glutamic 
acid and lysine, arginine or histidine, may introduce strong stabilizing effects and may be 
used to attach different tertiary structure elements with a resultant improvement in 
thermostability. Additionally, increases in the number of charged residue/non-charged 
residue hydrogen bonds, and the number of hydrogen-bonds generally, may improve 
thermostability (see, e.g., Tanner, etal., Biochemistry 35:2597-2609 (1996)). Substitution 
with aspartic acid, asparagine, glutamic acid or glutamine may introduce a hydrogen bond 
with a backbone amide. Substitution with arginine may improve a salt bridge and introduce 
an H-bond into a backbone carbonyl. 

[122] (5) Avoiding thermolabile residues in general may increase thermal stability. For 
example, asparagine and glutamine are susceptible to deamidation and cysteine is 
susceptible to oxidation at high temperatures. Reducing the number of these residues in 
sensitive positions may result in improved thermostability (Russel, et al., supra). Substitution 
or deletion by any residue other than glutamine or cysteine may increase stability by 
avoidance of a thermolabile residue. 

[123] (6) Stabilization or destabilization of binding of a ligand that confers modified 
stability to CBH1 variants. For example, a component of the matrix in which the CBH1 
variants of this invention are used may bind to a specific surfactant/thermal sensitivity site of 
the CBH1 variant. By modifying the site through substitution, binding of the component to 
the variant may be strengthened or diminished. For example, a non-aromatic residue in the 
binding crevice of CBH1 may be substituted with phenylalanine or tyrosine to introduce 
aromatic side-chain stabilization where interaction of the cellulose substrate may interact 
favorably with the benzyl rings, increasing the stability of the CBH1 variant. 
[124] (7) Increasing the electronegativity of any of the surfactant/ thermal sensitivity 
ligands may improve stability under surfactant or thermal stress. For example, substitution 
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with phenylalanine or tyrosine may increase the electronegativity of D (aspartate) residues 
by improving shielding from solvent, thereby improving stability. 

C. Homologous CBH1 Nucleic Acids and Polypeptides 

[125] Genomic DNA from microbial organisms is fixed to a membrane. The genomic DNA 
is hybridized with the gene specific probes and screened using PCR. The PCR product(s) 
are isolated using techniques well known in the art and sequenced. 

VI. Expression Of Recombinant CBH1 Homologs and Variants 

[126] The methods of the invention rely on the use cells to express a desired cellulase, 
with no particular method of expression required. 

[127] The invention provides host cells that have been transduced, transformed or 
transfected with an expression vector comprising a desired cellulase-encoding nucleic acid 
sequence. The culture conditions, such as temperature, pH and the like, are those 
previously used for the parental host cell prior to transduction, transformation or transfection 
and will be apparent to those skilled in the art. 

[128] In one approach, a filamentous fungal cell or yeast cell is transfected with an 
expression vector having a promoter or biologically active promoter fragment or one or more 
(e.g., a series) of enhancers which functions in the host cell line, operably linked to a DNA 
segment encoding a desired cellulase, such that desired cellulase is expressed in the cell 
line. 

A. Nucleic Acid Constructs/Expression Vectors. 
[129] Natural or synthetic polynucleotide fragments encoding a desired cellulase ("desired 
cellulase-encoding nucleic acid sequences") may be incorporated into heterologous nucleic 
acid constructs or vectors, capable of introduction into, and replication in, a filamentous 
fungal or yeast cell. The vectors and methods disclosed herein are suitable for use in host 
cells for the expression of a desired cellulase. Any vector may be used as long as it is 
replicable and viable in the cells into which it is introduced. Large numbers of suitable 
vectors and promoters are known to those of skill in the art, and are commercially available. 
Cloning and expression vectors are also described in Sambrook et al, 1989, Ausubel FM et 
al., 1989, and Strathem etal., 1981, each of which is expressly incorporated by reference 
herein. Appropriate expression vectors for fungi are described in van den Hondel, 
C.A. M.J.J, etal. (1991) In: Bennett, J.W. and Lasure, L.L. (eds.) More Gene Manipulations in 
Fungi. Academic Press, pp. 396-428. The appropriate DNA sequence may be inserted into 
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a plasmid or vector (collectively. referred to herein as "vectors") by a variety of procedures. 
In general, the DNA sequence is inserted into an appropriate restriction endonuclease site(s) 
by standard procedures. Such procedures and related sub-cloning procedures are deemed 
to be within the scope of knowledge of those skilled in the art. 
[130] Recombinant filamentous fungi comprising the coding sequence for a desired 
cellulase may be produced by introducing a heterologous nucleic acid construct comprising 
the desired cellulase coding sequence into the cells of a selected strain of the filamentous 
fungi. 

[131] Once the desired form of a desired cellulase nucleic acid sequence is obtained, it 
may be modified in a variety of ways. Where the sequence involves non-coding flanking 
regions, the flanking regions may be subjected to resection, mutagenesis, etc. Thus, 
transitions, transversions, deletions, and insertions may be performed on the naturally 
occurring sequence. 

[132] A selected desired cellulase coding sequence may be inserted into a suitable vector 
according to well-known recombinant techniques and used to transform filamentous fungi 
capable of cellulase expression. Due to the inherent degeneracy of the genetic code, other 
nucleic acid sequences which encode substantially the same or a functionally equivalent 
amino acid sequence may be used to clone and express a desired cellulase. Therefore it is 
appreciated that such substitutions in the coding region fall within the sequence variants 
covered by the present invention. 

[133] The present invention also includes recombinant nucleic acid constructs comprising 
one or more of the desired cellulase-encoding nucleic acid sequences as described above. 
The constructs comprise a vector, such as a plasmid or viral vector, into which a sequence 
of the invention has been inserted, in a forward or reverse orientation. 
[134] Heterologous nucleic acid constructs may include the coding sequence for a desired 
cellulase: (i) in isolation; (ii) in combination with additional coding sequences; such as fusion 
protein or signal peptide coding sequences, where the desired cellulase coding sequence is 
the dominant coding sequence; (iii) in combination with non-coding sequences, such as 
introns and control elements, such as promoter and terminator elements or 5* and/or 3* 
untranslated regions, effective for expression of the coding sequence in a suitable host; 
and/or (iv) in a vector or host environment in which the desired cellulase coding sequence is 
a heterologous gene. 

[135] In one aspect of the present invention, a heterologous nucleic acid construct is 
employed to transfer a desired cellulase-encoding nucleic acid sequence into a cell in vitro, 
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with established filamentous fungal and yeast lines preferred. For long-term, production of a 
desired cellulase, stable expression is preferred. It follows that any method effective to 
generate stable transformants may be used in practicing the invention. 
[136] Appropriate vectors are typically equipped with a selectable marker-encoding nucleic 
acid sequence, insertion sites, and suitable control elements, such as promoter and 
termination sequences. The vector may comprise regulatory sequences, including, for 
example, non-coding sequences, such as introns and control elements, i.e., promoter and 
terminator elements or 5' and/or 3' untranslated regions, effective for expression of the 
coding sequence in host cells (and/or in a vector or host cell environment in which a 
modified soluble protein antigen coding sequence is not normally expressed), operably 
linked to the coding sequence. Large numbers of suitable vectors and promoters are known 
to those of skill in the art, many of which are commercially available and/or are described in 
Sambrook, et a/., (supra). 

[137] Exemplary promoters include both constitutive promoters and inducible promoters, 
examples of which include a CMV promoter, an SV40 early promoter, an RSV promoter, an 
EF-1a promoter, a promoter containing the tet responsive element (TRE) in the tet-on or tet- 
off system as described (ClonTech and BASF), the beta actin promoter and the 
metallothionine promoter that can upregulated by addition of certain metal salts. A promoter 
sequence is a DNA sequence which is recognized by the particular filamentous fungus for 
expression purposes. It is operably linked to DNA sequence encoding a variant CBH1 
polypeptide. Such linkage comprises positioning of the promoter with respect to the initiation 
codon of the DNA sequence encoding the variant CBH1 polypeptide in the disclosed 
expression vectors. The promoter sequence contains transcription and translation control 
sequence which mediate the expression of the variant CBH1 polypeptide. Examples include 
the promoters from the Aspergillus niger, A awamori or A. oryzae glucoamylase, alpha- 
amylase, or alpha-glucosidase encoding genes; the A. nidulans gpdA or trpC Genes; the 
Neurospora crassa cbhl or trp1 genes; the A. nigeror Rhizomucor miehei aspartic 
proteinase encoding genes; the H.jecorina cbhl, cbh2, egll, egl2, or other cellulase 
encoding genes. 

[138] The choice of the proper selectable marker will depend on the host cell, and 
appropriate markers for different hosts are well known in the art. Typical selectable marker 
genes include argB from A. nidulans or H.jecorina, amdS from A. nidulans, pyr4 from 
Neurospora crassa or H.jecorina] pyrG from Aspergillus nigeror A. nidulans. Additional 
exemplary selectable markers include, but are not limited to trpc, trp1 , oliC31 , niaD or Ieu2, 
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which are included in heterologous nucleic acid constructs used to transform a mutant strain 
such as trp-, pyr-, leu- and the like. 

[139] Such selectable markers confer to transformants the ability to utilize a metabolite that 
is usually not metabolized by the filamentous fungi. For example, the amdS gene from H. 
jecorina which encodes the enzyme acetamidase that allows transformant cells to grow on 
acetamide as a nitrogen source. The selectable marker (e.g. pyrG) may restore the ability of 
an auxotrophic mutant strain to grow on a selective minimal medium or the selectable 
marker (e.g. olic31 ) may confer to transformants the ability to grow in the presence of an 
inhibitory drug or antibiotic. 

[140] The selectable marker coding sequence is cloned into any suitable plasmid using 
methods generally employed in the art. Exemplary plasmids include pUC18, pBR322, pRAX 
and pUC100. The pRAX plasmid contains AMA1 sequences from A. nidulans, which make it 
possible to replicate in A. niger. 

[141] The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, recombinant DNA, and 
immunology, which are within the skill of the art. Such techniques are explained fully in the 
literature. See, for example, Sambrook et a/., 1989; Freshney, 1987; Ausubel, et a/., 1993; 
and Coligan et a/., 1991 . All patents, patent applications, articles and publications 
mentioned herein, are hereby expressly incorporated herein by reference. 
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B. Host Cells and Culture Conditions For CBH1 Production 
(i) Filamentous Fungi 
[142] Thus, the present invention provides filamentous fungi comprising cells which have 
been modified, selected and cultured in a manner effective to result in desired cellulase 
production or expression relative to the corresponding non-transformed parental fungi. 
[143] Examples of species of parental filamentous fungi that may be treated and/or 
modified for desired cellulase expression include, but are not limited to Trichoderma, 
Penicillium sp., Humicola sp., including Humicola insolens; Aspergillus sp., including 
Aspergillus niger, Chrysosporium sp., Fusarium sp., Hypocrea sp., and Emericella sp. 
[144] Cells expressing a desired cellulase are cultured under conditions typically employed 
to culture the parental fungal line. Generally, cells are cultured in a standard medium 
containing physiological salts and nutrients, such as described in Pourquie, J. et al., 
Biochemistry and Genetics of Cellulose Degradation, eds. Aubert, J. P. et al., Academic 
Press, pp. 71-86, 1988 and llmen, M. et al., Appl. Environ. Microbiol. 63:1298-1306, 1997. 
Culture conditions are also standard, e.g., cultures are incubated at 28°C in shaker cultures 
orfermenters until desired levels of desired cellulase expression are achieved. 
[145] Preferred culture conditions for a given filamentous fungus may be found in the 
scientific literature and/or from the source of the fungi such as the American Type Culture 
Collection (ATCC; <www.atcc.org>). After fungal growth has been established, the cells are 
exposed to conditions effective to cause or permit the expression of a desired cellulase. 
[146] In cases where a desired cellulase coding sequence is under the control of an 
inducible promoter, the inducing agent, e.g., a sugar, metal salt or antibiotics, is added to the 
medium at a concentration effective to induce desired cellulase expression. 
[147] In one embodiment, the strain comprises Aspergillus niger, which is a useful strain 
for obtaining overexpressed protein. For example A. niger var awamori dgr246 is known to 
secrete elevated amounts of secreted cellulases (Goedegebuur et al, Curr. Genet (2002) 41 : 
89-98). Other strains of Aspergillus niger var awamori such as GCDAP3, GCDAP4 and 
GAP3-4 are known Ward et a! (Ward, M, Wilson, L.J. and Kodama, K.H., 1993, Appl. 
Microbiol. Biotechnol. 39:738-743). 

[148] In another embodiment, the strain comprises Trichoderma reesei, which is a useful 
strain for obtaining overexpressed protein. For example, RL-P37, described by Sheir-Neiss, 
et al., Appl. Microbiol. Biotechnol. 20:46-53 (1984) is known to secrete elevated amounts of 
cellulase enzymes. Functional equivalents of RL-P37 include Trichoderma reesei strain 
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RUT-C30 (ATCC No. 56765) and strain QM9414 (ATCC No. 26921). It is contemplated that 
these strains would also be useful in overexpressing variant CBH1. 
[149] Where it is desired to obtain the desired cellulase in the absence of potentially 
detrimental native cellulolytic activity, it is useful to obtain a host cell strain which has had 
one or more cellulase genes deleted prior to introduction of a DNA construct or plasmid 
containing the DNA fragment encoding the desired cellulase. Such strains may be prepared 
by the method disclosed in U.S. Patent No. 5,246,853 and WO 92/06209, which disclosures 
are hereby incorporated by reference. By expressing a desired cellulase in a host 
microorganism that is missing one or more cellulase genes, the identification and 
subsequent purification procedures are simplified. 

[150] Gene deletion may be accomplished by inserting a form of the desired gene to be 
deleted or disrupted into a plasmid by methods known in the art. The deletion plasmid is 
then cut at an appropriate restriction enzyme site(s), internal to the desired gene coding 
region, and the gene coding sequence or part thereof replaced with a selectable marker. 
Flanking DNA sequences from the locus of the gene to be deleted or disrupted, preferably 
between about 0.5 to 2.0 kb, remain on either side of the selectable marker gene. An 
appropriate deletion plasmid will generally have unique restriction enzyme sites present 
therein to enable the fragment containing the deleted gene, including flanking DNA 
sequences, and the selectable marker gene to be removed as a single linear piece. 
[151] A selectable marker must be chosen so as to enable detection of the transformed 
microorganism. Any selectable marker gene that is expressed in the selected 
microorganism will be suitable. For example, with Aspergillus sp., the selectable marker is 
chosen so that the presence of the selectable marker in the transformants will not 
significantly affect the properties thereof. Such a selectable marker may be a gene that 
encodes an assayable product. For example, a functional copy of a Aspergillus sp. gene 
may be used which if lacking in the host strain results in the host strain displaying an 
auxotrophic phenotype. 

[152] In a preferred embodiment, a pyrG' derivative strain of Aspergillus sp. is transformed 
with a functional pyrG gene, which thus provides a selectable marker for transformation. A 
pyrG' derivative strain may be obtained by selection of Aspergillus sp. strains that are 
resistant to fluoroorotic acid (FOA). The pyrG gene encodes orotidine-5'-monophosphate 
decarboxylase, an enzyme required for the biosynthesis of uridine. Strains with an intact 
pyrG gene grow in a medium lacking uridine but are sensitive to fluoroorotic acid. It is 
possible to select pyrG' derivative strains that lack a functional orotidine monophosphate 
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decarboxylase enzyme and require uridine for growth by selecting for FOA resistance. 
Using the FOA selection technique it is also possible to obtain uridine-requiring strains which 
lack a functional orotate pyrophosphoribosyl transferase. It is possible to transform these 
cells with a functional copy of the gene encoding this enzyme (Berges & Barreau, Curr. 
Genet 19:359-365 (1991), and van Hartingsveldte et al., (1986) Development of a 
homologous transformation system for Aspergillus niger based on the pyrG gene. Mol. Gen. 
Genet. 206:71-75). Selection of derivative strains is easily performed using the FOA 
resistance technique referred to above, and thus, the pyrG gene is preferably employed as a 
selectable marker. 

[153] To transform pyrG' Aspergillus sp. so as to be lacking in the ability to express one or 
more cellulase genes, a single DNA fragment comprising a disrupted or deleted cellulase 
gene is then isolated from the deletion plasmid and used to transform an appropriate pyr 
Aspergillus host. Transformants are then identified and selected based on their ability to 
express the pyrG gene product and thus compliment the uridine auxotrophy of the host 
strain. Southern blot analysis is then carried out on the resultant transformants to identify 
and confirm a double crossover integration event that replaces part or all of the coding 
region of the genomic copy of the gene to be deleted with the pyr4 selectable markers. 
[154] Although the specific plasmid vectors described above relate to preparation of pyr 
transformants, the present invention is not limited to these vectors. Various genes can be 
deleted and replaced in the Aspergillus sp. strain using the above techniques. In addition, 
any available selectable markers can be used, as discussed above. In fact, any Aspergillus 
sp. gene that has been cloned, and thus identified, can be deleted from the genome using 
the above-described strategy. 

[155] As stated above, the host strains used are derivatives of Aspergillus sp. that lack or 
have a nonfunctional gene or genes corresponding to the selectable marker chosen. For 
example, if the selectable marker of pyrG is chosen, then a specific pyrG' derivative strain is 
used as a recipient in the transformation procedure. Similarly, selectable markers 
comprising Aspergillus sp. genes equivalent to the Aspergillus nidulans genes amdS, argB, 
trpC, niaD may be used. The corresponding recipient strain must therefore be a derivative 
strain such as argB', trpC', niaD', respectively. 

[156] DNA encoding the desired cellulase is then prepared for insertion into an appropriate 
microorganism. According to the present invention, DNA encoding a desired cellulase 
comprises the DNA necessary to encode for a protein that has functional cellulolytic activity. 
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The DNA fragment encoding the desired cellulase may be functionally attached to a fungal 
promoter sequence, for example, the promoter of the glaA gene. 

[157] It is also contemplated that more than one copy of DNA encoding a desired cellulase 
may be recombined into the strain to facilitate overexpression. The DNA encoding the 
desired cellulase may be prepared by the construction of an expression vector carrying the 
DNA encoding the cellulase. The expression vector carrying the inserted DNA fragment 
encoding the desired cellulase may be any vector which is capable of replicating 
autonomously in a given host organism or of integrating into the DNA of the host, typically a 
plasmid. In preferred embodiments two types of expression vectors for obtaining expression 
of genes are contemplated. The first contains DNA sequences in which the promoter, gene- 
coding region, and terminator sequence all originate from the gene to be expressed. Gene 
truncation may be obtained where desired by deleting undesired DNA sequences (e.g., 
coding for unwanted domains) to leave the domain to be expressed under control of its own 
transcriptional and translational regulatory sequences. A selectable marker is also 
contained on the vector allowing the selection for integration into the host of multiple copies 
of the novel gene sequences. 

[158] The second type of expression vector is preassembled and contains sequences 
required for high-level transcription and a selectable marker. It is contemplated that the 
coding region for a gene or part thereof can be inserted into this general-purpose expression 
vector such that it is under the transcriptional control of the expression cassettes promoter 
and terminator sequences. For example, pRAX is such a general-purpose expression 
vector. Genes or part thereof can be inserted downstream of the strong glaA promoter. 
[159] In the vector, the DNA sequence encoding the desired cellulase of the present 
invention should be operably linked to transcriptional and translational sequences, i.e., a 
suitable promoter sequence and signal sequence in reading frame to the structural gene. 
The promoter may be any DNA sequence that shows transcriptional activity in the host cell 
and may be derived from genes encoding proteins either homologous or heterologous to the 
host cell. An optional signal peptide provides for extracellular production of the desired 
cellulase. The DNA encoding the signal sequence is preferably that which is naturally 
associated with the gene to be expressed, however the signal sequence from any suitable 
source is contemplated in the present invention. 

[160] The procedures used to fuse the DNA sequences coding for the desired cellulase of 
the present invention with the promoter into suitable vectors are well known in the art. 
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[161] The DNA vector or construct described above may be introduced in the host cell in 
accordance with known techniques such as transformation, transfection, microinjection, 
microporation, biolistic bombardment and the like. 

[162] The preferred method in the present invention to prepare Aspergillus sp. for 
transformation involves the preparation of protoplasts from fungal mycelium. See Campbell 
etal. Improved transformation efficiency of A.niger using homologous niaD gene for nitrate 
reductase. Curr. Genet. 16:53-56; 1989. The mycelium can be obtained from germinated 
vegetative spores. The mycelium is treated with an enzyme that digests the cell wall 
resulting in protoplasts. The protoplasts are then protected by the presence of an osmotic 
stabilizer in the suspending medium. These stabilizers include sorbitol, mannitol, potassium 
chloride, magnesium sulfate and the like. Usually the concentration of these stabilizers 
varies between 0.8 M and 1 .2 M. It is preferable to use about a 1 .2 M solution of sorbitol in 
the suspension medium. 

[163] Uptake of the DNA into the host Aspergillus sp. strain is dependent upon the calcium 
ion concentration. Generally between about 10 mM CaCI 2 and 50 mM CaCI 2 is used in an 
uptake solution. Besides the need for the calcium ion in the uptake solution, other items 
generally included are a buffering system such as TE buffer (10 Mm Tris, pH 7.4; 1 mM 
EDTA) or 10 mM MOPS, pH 6.0 buffer (morpholinepropanesulfonic acid) and polyethylene 
glycol (PEG). It is believed that the polyethylene glycol acts to fuse the cell membranes thus 
permitting the contents of the medium to be delivered into the cytoplasm of the Aspergillus 
sp. strain and the plasmid DNA is transferred to the nucleus. This fusion frequently leaves 
multiple copies of the plasmid DNA tenderly integrated into the host chromosome. 
[164] Usually a suspension containing the Aspergillus sp. protoplasts or cells that have 
been subjected to a permeability treatment at a density of 10 5 to 10 6 /mL, preferably 2 x 
10 5 /ml_ are used in transformation. A volume of 100 pL of these protoplasts or cells in an 
appropriate solution (e.g., 1.2 M sorbitol; 50 mM CaCI 2 ) are mixed with the desired DNA. 
Generally a high concentration of PEG is added to the uptake solution. From 0.1 to 1 
volume of 25% PEG 4000 can be added to the protoplast suspension. However, it is 
preferable to add about 0.25 volumes to the protoplast suspension. Additives such as 
dimethyl sulfoxide, heparin, spermidine, potassium chloride and the like may also be added 
to the uptake solution and aid in transformation. 

[165] Generally, the mixture is then incubated at approximately 0°C for a period of between 
10 to 30 minutes. Additional PEG is then added to the mixture to further enhance the uptake 
of the desired gene or DNA sequence. The 25% PEG 4000 is generally added in volumes of 
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5 to 15 times the volume of the transformation mixture; however, greater and lesser volumes 
may be suitable. The 25% PEG 4000 is preferably about 10 times the volume of the 
transformation mixture. After the PEG is added, the transformation mixture is then incubated 
either at room temperature or on ice before the addition of a sorbitol and CaCI 2 solution. 
The protoplast suspension is then further added to molten aliquots of a growth medium. 
This growth medium permits the growth of transformants only. Any growth medium can be 
used in the present invention that is suitable to grow the desired transformants. However, if 
Pyr + transformants are being selected it is preferable to use a growth medium that contains 
no uridine. The subsequent colonies are transferred and purified on a growth medium 
depleted of uridine. 

[166] At this stage, stable transformants may be distinguished from unstable transformants 
by their faster growth rate and the formation of circular colonies with a smooth, rather than 
ragged outline on solid culture medium lacking uridine. Additionally, in some cases a further 
test of stability may made by growing the transformants on solid non-selective medium (i.e. 
containing uridine), harvesting spores from this culture medium and determining the 
percentage of these spores which will subsequently germinate and grow on selective 
medium lacking uridine. 

[167] In a particular embodiment of the above method, the desired cellulase(s) are 
recovered in active form from the host cell after growth in liquid media either as a result of 
the appropriate post translational processing of the desired cellulase. 
(ii) Yeast 

[168] The present invention also contemplates the use of yeast as a host cell for desired 
cellulase production. Several other genes encoding hydrolytic enzymes have been 
expressed in various strains of the yeast S. cerevisiae. These include sequences encoding 
for two endoglucanases (Penttila et al., 1987), two cellobiohydrolases (Penttila et a/., 1988) 
and one beta-glucosidase from Trichoderma reesei (Cummings and Fowler, 1996), a 
xylanase from Aureobasidlium pullulans (Li and Ljungdahl, 1996), an alpha-amylase from 
wheat (Rothstein et a/., 1987), etc. In addition, a cellulase gene cassette encoding the 
Butyrivibrio fibrisolvens endo- [beta] -1,4-glucanase (END1), Phanerochaete chrysosporium 
cellobiohydrolase (CBH1), the Ruminococcus flavefaciens cellodextrinase (CEL1) and the 
Endomyces fibrilizer cellobiase (Bgl1) was successfully expressed in a laboratory strain of S. 
cerevisiae (Van Rensburg et al., 1998). 
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C. Introduction of a Desired Cellulase-Encoding Nucleic Acid Sequence into 
Host Cells. 

[169] The invention further provides cells and cell compositions which have been 
genetically modified to comprise an exogenously provided desired cellulase -encoding 
nucleic acid sequence. A parental cell or cell line may be genetically modified (i.e., 
transduced, transformed or transfected) with a cloning vector or an expression vector. The 
vector may be, for example, in the form of a plasmid, a viral particle, a phage, etc, as further 
described above. 

[170] The methods of transformation of the present invention may result in the stable 
integration of all or part of the transformation vector into the genome of the filamentous 
fungus. However, transformation resulting in the maintenance of a self-replicating extra- 
chromosomal transformation vector is also contemplated. 

[171] Many standard transfection methods can be used to produce Trichoderma reesei cell 
lines that express large quantities of the heterologus protein. Some of the published 
methods for the introduction of DNA constructs into cellulase-producing strains of 
Trichoderma include Lorito, Hayes, DiPietro and Harman, 1993, Curr. Genet. 24: 349-356; 
Goldman, VanMontagu and Herrera-Estrella, 1990, Curr. Genet. 17:169-174; Penttila, 
Nevalainen, Ratto, Salminen and Knowles, 1987, Gene 6: 155-164, for Aspergillus Yelton, 
Hamer and Timberlake, 1984, Proc. Natl. Acad. Sci. USA 81: 1470-1474, for Fusarium 
Bajar, Podila and Kolattukudy, 1991, Proc. Natl. Acad. Sci. USA 88: 8202-8212, for 
Streptomyces Hopwood et al., 1985, The John Innes Foundation, Norwich, UK and for 
Bacillus Brigidi, DeRossi, Bertarini, Riccardi and Matteuzzi, 1990, FEMS Microbiol. Lett. 55: 
135-138). 

[172] Any of the well-known procedures for introducing foreign nucleotide sequences into 
host cells may be used. These include the use of calcium phosphate transfection, 
polybrene, protoplast fusion, electroporation, biolistics, liposomes, microinjection, plasma 
vectors, viral vectors and any of the other well known methods for introducing cloned 
genomic DNA, cDNA, synthetic DNA or other foreign genetic material into a host cell (see, 
e.g., Sambrook et al., supra). Also of use is the Agrobacterium-mediated transfection 
method described in U.S. Patent No. 6,255,1 15. It is only necessary that the particular 
genetic engineering procedure used be capable of successfully introducing at least one gene 
into the host cell capable of expressing the heterologous gene. 
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[173] In addition, heterologous nucleic acid constructs comprising a desired cellulase- 
encoding nucleic acid sequence can be transcribed in vitro, and the resulting RNA 
introduced into the host cell by well-known methods, e.g., by injection. 
[174] The invention further includes novel and useful transformants of filamentous fungi 
such as H.jecorina and A. n/ger for use in producing fungal cellulase compositions. The 
invention includes transformants of filamentous fungi especially fungi comprising the desired 
cellulase coding sequence, or deletion of the endogenous cbh coding sequence. 

VII. Analysis For CBH1 Nucleic Acid Coding Sequences and/or Protein Expression. 

[175] In order to evaluate the expression of a desired cellulase by a cell line that has been 
transformed with a desired cellulase-encoding nucleic acid construct, assays can be carried 
out at the protein level, the RNA level or by use of functional bioassays particular to 
cellobiohydrolase activity and/or production. 

[176] In one exemplary application of the desired cellulase nucleic acid and protein 
sequences described herein, a genetically modified strain of filamentous fungi, e.g., 
Trichoderma reese/, is engineered to produce an increased amount of a desired cellulase. 
Such genetically modified filamentous fungi would be useful to produce a cellulase product 
with greater increased cellulolytic capacity. In one approach, this is accomplished by 
introducing the coding sequence for a desired cellulase into a suitable host, e.g., a 
filamentous fungi such as Aspergillus niger. 

[177] Accordingly, the invention includes methods for expressing a desired cellulase in a 
filamentous fungus or other suitable host by introducing an expression vector containing the 
DNA sequence encoding a desired cellulase into cells of the filamentous fungus or other 
suitable host. 

[178] In another aspect, the invention includes methods for modifying the expression of a 
desired cellulase in a filamentous fungus or other suitable host. Such modification includes 
a decrease or elimination in expression of the endogenous CBH. 

[179] In general, assays employed to analyze the expression of a desired cellulase include, 
Northern blotting, dot blotting (DNA or RNA analysis), RT-PCR (reverse transcriptase 
polymerase chain reaction), or in situ hybridization, using an appropriately labeled probe 
(based on the nucleic acid coding sequence) and conventional Southern blotting and 
autoradiography. 

[180] In addition, the production and/or expression of a desired cellulase may be measured 
in a sample directly, for example, by assays for cellobiohydrolase activity, expression and/or 
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production. Such assays are described, for example, in Becker et al., Biochem J. (2001) 
356:19-30 and Mitsuishi et al., FEBS (1990) 275:135-138, each of which is expressly 
incorporated by reference herein. The ability of CBH1 to hydrolyze isolated soluble and 
insoluble substrates can be measured using assays described in Srisodsuk et al., J. Biotech. 
(1997) 57:49-57 and Nidetzky and Claeyssens Biotech. Bioeng. (1994) 44:961-966. 
Substrates useful for assaying cellobiohydrolase, endoglucanase or (3-glucosidase activities 
include crystalline cellulose, filter paper, phosphoric acid swollen cellulose, 
cellooligosaccharides, methylumbelliferyl lactoside, methylumbelliferyl cellobioside, 
orthonitrophenyl lactoside, paranitrophenyl lactoside, orthonitrophenyl cellobioside, 
paranitrophenyl cellobioside. 

[181] In addition, protein expression, may be evaluated by immunological methods, such 
as immunohistochemical staining of cells, tissue sections or immunoassay of tissue culture 
medium, e.g., by Western blot or ELISA. Such immunoassays can be used to qualitatively 
and quantitatively evaluate expression of a desired cellulase. The details of such methods 
are known to those of skill in the art and many reagents for practicing such methods are 
commercially available. 

[182] A purified form of a desired cellulase may be used to produce either monoclonal or 
polyclonal antibodies specific to the expressed protein for use in various immunoassays. 
(See, e.g., Hu et al., 1991). Exemplary assays include ELISA, competitive immunoassays, 
radioimmunoassays, Western blot, indirect immunofluorescent assays and the like. In 
general, commercially available antibodies and/or kits may be used for the quantitative 
immunoassay of the expression level of cellobiohydrolase proteins. 

VIII. Isolation And Purification Of Recombinant CBH1 Protein. 

[183] In general, a desired cellulase protein produced in cell culture is secreted into the 
medium and may be purified or isolated, e.g., by removing unwanted components from the 
cell culture medium. However, in some cases, a desired cellulase protein may be produced 
in a cellular form necessitating recovery from a cell lysate. In such cases the desired 
cellulase protein is purified from the cells in which it was produced using techniques routinely 
employed by those of skill in the art. Examples include, but are not limited to, affinity 
chromatography (Tilbeurgh et al., 1984), ion-exchange chromatographic methods (Goyal et 
a/., 1991; Fliess et al., 1983; Bhikhabhai et a/., 1984; Ellouz et al., 1987), including ton- 
exchange using materials with high resolution power (Medve et al., 1998), hydrophobic 
interaction chromatography (Tomaz and Queiroz, 1999), and two-phase partitioning 
(Brumbauer, et al., 1999). 
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[184] Typically, the desired cellulase protein is fractionated to segregate proteins having 
selected properties, such as binding affinity to particular binding agents, e.g., antibodies or 
receptors; or which have a selected molecular weight range, or range of isoelectric points. 
[185] Once expression of a given desired cellulase protein is achieved, the desired 
cellulase protein thereby produced is purified from the cells or cell culture. Exemplary 
procedures suitable for such purification include the following: antibody-affinity column 
chromatography, ion exchange chromatography; ethanol precipitation; reverse phase HPLC; 
chromatography on silica or on a cation-exchange resin such as DEAE; chromatofocusing; 
SDS-PAGE; ammonium sulfate precipitation; and gel filtration using, e.g., Sephadex G-75. 
Various methods of protein purification may be employed and such methods are known in 
the art and described e.g. in Deutscher, 1990; Scopes, 1982. The purification step(s) 
selected will depend, e.g., on the nature of the production process used and the particular 
protein produced. 

IX. Utility of cbM and CBH1 

[186] It can be appreciated that the desired cellulase nucleic acids, the desired cellulase 
protein and compositions comprising desired cellulase protein activity find utility in a wide 
variety applications, some of which are described below. 

[187] New and improved cellulase compositions that comprise varying amounts of a 
desired cellulase find utility in detergent compositions that exhibit enhanced cleaning ability, 
function as a softening agent and/or improve the feel of cotton fabrics (e.g., "stone washing" 
or "biopolishing"), in compositions for degrading wood pulp into sugars (e.g., for bio-ethanol 
production), and/or in feed compositions. The isolation and characterization of cellulase of 
each type provides the ability to control the aspects of such compositions. 
[188] Desired cellulases with decreased thermostability find uses, for example, in areas 
where the enzyme activity is required to be neutralized at lower temperatures so that other 
enzymes that may be present are left unaffected. In addition, the enzymes may find utility in 
the limited conversion of cellulosics, for example, in controlling the degree of crystallinity or 
of cellulosic chain-length. After reaching the desired extent of conversion the saccharifying 
temperature can be raised above the survival temperature of the de-stabilized CBH1. As the 
CBH1 activity is essential for hydrolysis of crystalline cellulose, conversion of crystalline 
cellulose will cease at the elevated temperature. 

[189] In one approach, the cellulase of the invention finds utility in detergent compositions 
or in the treatment of fabrics to improve the feel and appearance. 
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[190] Since the rate of hydrolysis of cellulosic products may be increased by using a 
transformant having at least one additional copy of the desired cellulase gene, either as a 
replicative plasmid or inserted into the genome, products that contain cellulose or 
heteroglycans can be degraded at a faster rate and to a greater extent. Products made from 
cellulose such as paper, cotton, cellulosic diapers and the like can be degraded more 
efficiently in a landfill. Thus, the fermentation product obtainable from the transformants or 
the transformants alone may be used in compositions to help degrade by liquefaction a 
variety of cellulose products that add to the overcrowded landfills. 

[191] Separate saccharification and fermentation is a process whereby cellulose present in 
biomass, e.g., corn stover, is converted to glucose and subsequently yeast strains convert 
glucose into ethanol. Simultaneous saccharification and fermentation is a process whereby 
cellulose present in biomass, e.g., corn stover, is converted to glucose and, at the same time 
and in the same reactor, yeast strains convert glucose into ethanol. Thus, in another 
approach, the desired cellulase of the invention finds utility in the degradation of biomass to 
ethanol. Ethanol production from readily available sources of cellulose provides a stable, 
renewable fuel source. 

[192] Cellulose-based feedstocks are comprised of agricultural wastes, grasses and woods 
and other low-value biomass such as municipal waste (e.g., recycled paper, yard clippings, 
etc.). Ethanol may be produced from the fermentation of any of these cellulosic feedstocks. 
However, the cellulose must first be converted to sugars before there can be conversion to 
ethanol. 

[193] A large variety of feedstocks may be used with the inventive desired cellulase(s) and 
the one selected for use may depend on the region where the conversion is being done. For 
example, in the Midwestern United States agricultural wastes such as wheat straw, corn 
stover and bagasse may predominate while in California rice straw may predominate. 
However, it should be understood that any available cellulosic biomass may be used in any 
region. 

[194] A cellulase composition containing an enhanced amount of cellobiohydrolase finds 
utility in ethanol production. Ethanol from this process can be further used as an octane 
enhancer or directly as a fuel in lieu of gasoline which is advantageous because ethanol as a 
fuel source is more environmentally friendly than petroleum derived products. It is known 
that the use of ethanol will improve air quality and possibly reduce local ozone levels and 
smog. Moreover, utilization of ethanol in lieu of gasoline can be of strategic importance in 
buffering the impact of sudden shifts in non-renewable energy and petro-chemical supplies. 
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[195] Ethanol can be produced via saccharification and fermentation processes from 
cellulosic biomass such as trees, herbaceous plants, municipal solid waste and agricultural 
and forestry residues. However, the ratio of individual cellulase enzymes within a naturally 
occurring cellulase mixture produced by a microbe may not be the most efficient for rapid 
conversion of cellulose in biomass to glucose. It is known that endoglucanases act to 
produce new cellulose chain ends which themselves are substrates for the action of 
cellobiohydrolases and thereby improve the efficiency of hydrolysis of the entire cellulase 
system. Therefore, the use of increased or optimized cellobiohydrolase activity may greatly 
enhance the production of ethanol. 

[196] Thus, the inventive cellobiohydrolase(s) finds use in the hydrolysis of cellulose to its 
sugar components. In one embodiment, a desired cellulase is added to the biomass prior to 
the addition of a fermentative organism. In a second embodiment, a desired cellulase is 
added to the biomass at the same time as a fermentative organism. Optionally, there may 
be other cellulase components present in either embodiment. 

[197] In another embodiment the cellulosic feedstock may be pretreated. Pretreatment 
may be by elevated temperature and the addition of either of dilute acid, concentrated acid 
or dilute alkali solution. The pretreatment solution is added for a time sufficient to at least 
partially hydrolyze the hemicellulose components and then neutralized. 
[198] The detergent compositions of this invention may employ besides the cellulase 
composition (irrespective of the cellobiohydrolase content, i.e., cellobiohydrolase -free, 
substantially cellobiohydrolase -free, or cellobiohydrolase enhanced), a surfactant, including 
anionic, non-ionic and ampholytic surfactants, a hydrolase, building agents, bleaching 
agents, bluing agents and fluorescent dyes, caking inhibitors, solubilizers, cationic 
surfactants and the like. All of these components are known in the detergent art. The 
cellulase composition as described above can be added to the detergent composition either 
in a liquid diluent, in granules, in emulsions, in gels, in pastes, and the like. Such forms are 
well known to the skilled artisan. When a solid detergent composition is employed, the 
cellulase composition is preferably formulated as granules. Preferably, the granules can be 
formulated so as to contain a cellulase protecting agent. For a more thorough discussion, 
see US Patent Number 6,162,782 entitled "Detergent compositions containing cellulase 
compositions deficient in CBH1 type components," which is incorporated herein by 
reference. 

[199] Preferably the cellulase compositions are employed from about 0.00005 weight 
percent to about 5 weight percent relative to the total detergent composition. More 
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preferably, the cellulase compositions are employed from about 0.0002 weight percent to 
about 2 weight percent relative to the total detergent composition. 

[200] In addition the desired cellulase nucleic acid sequence finds utility in the identification 
and characterization of related nucleic acid sequences. A number of techniques useful for 
determining (predicting or confirming) the function of related genes or gene products include, 
but are not limited to, (A) DNA/RNA analysis, such as (1) overexpression, ectopic 
expression, and expression in other species; (2) gene knock-out (reverse genetics, targeted 
knock-out, viral induced gene silencing (VIGS, see Baulcombe, 1999); (3) analysis of the 
methylation status of the gene, especially flanking regulatory regions; and (4) in situ 
hybridization; (B) gene product analysis such as (1) recombinant protein expression; (2) 
antisera production, (3) immunolocalization; (4) biochemical assays for catalytic or other 
activity; (5) phosphorylation status; and (6) interaction with other proteins via yeast two- 
hybrid analysis; (C) pathway analysis, such as placing a gene or gene product within a 
particular biochemical or signaling pathway based on its overexpression phenotype or by 
sequence homology with related genes; and (D) other analyses which may also be 
performed to determine or confirm the participation of the isolated gene and its product in a 
particular metabolic or signaling pathway, and help determine gene function. 



EXAMPLES 

[201] The present invention is described in further detain in the following examples which 
are not in any way intended to limit the scope of the invention as claimed. The attached 
Figures are meant to be considered as integral parts of the specification and description of 
the invention. All references cited are herein specifically incorporated by reference for all 
that is described therein. 

Example 1 
Identification of CBH1 homologs 

[202] This example illustrates the novel CBH1 homologs found in a variety of fungi. 
Genomic DNA was prepared for several different microorganisms for the purpose of 
undertaking a PCR reaction to determine whether homologous CBH1 cellulases are 
encoded by the DNA of a particular organism. 
Isolation of Genomic DNA 

[203] Genomic DNA may be isolated using any method known in the art. In this set of 
experiments we received 48 genomic DNA solutions from diverse Hypocrea and 
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Trichoderma species from collaboration with the Technical University of Vienna (TUV), 
Hypocrea schweinitzii (CBS 243.63), Hypocrea orientalis (PPRI 3894), Trichoderma 
pseudokoningii (CBS 408.91) and Trichoderma konilangbra (isolate 1). However, the 
following protocol may be used: 

[204] Cells are grown at 30°C in 20 ml Potato Dextrose Broth (PDB) for 24 hours. The 
cells are diluted 1 :20 in fresh PDB medium and grown overnight. Two milliliters of cells are 
centrifuged and the pellet washed in 1 ml KC (60g KCI, 2g citric acid per liter, pH adjusted to 
6.2 with 1M KOH). The cell pellet is resuspended in 900 pi KC. 100 pi (20 mg/ml) 
Novozyme® is added, mixed gently and the protoplastation followed microscopically at 37°C 
until greater than 95% protoplasts are formed for a maximum of 2 hours. The cells are 
centrifuged at 1500 rpm (460g) for 10 minutes. 200 pi TES/SDS (10mM Tris, 50mM EDTA, 
150mM NaCI, 1% SDS) is added, mixed and incubated at room temperature for 5 minutes. 
DNA is isolated using a Qiagen mini-prep isolation kit (Qiagen). The column is eluted with 
100 pi milli-Q water and the DNA collected. 

[205] An alternative method using the FastPrep® method may be desirable. The system 
consists of the FastPrep® Instrument as well as FastPrep® kits for nucleic acid isolation. 
FastPrep® is available from Qbiogene. 
Construction of primers 

[206], PCR was performed on a standard PCR machine such as the PCT-200 Peltier 
Thermal Cycler from MJ Research Inc. under the following conditions: 

1 ) 1 minute at 96°C for 1 cycle 

2) 30 seconds at 94°C 

90 seconds at 45°C (+1°C per cycle) 
2 minutes at 72°C 

3) Repeat step 2 for 1 0 cycles 

4) 30 seconds at 94°C 
90 seconds at 55°C 
2 minutes at 72°C 

5) Repeat step 4 for 20 cycles 

6) 7 minutes at 72°C for 1 cycle, and 

7) lower temperature to 1 5°C for storage and further analysis. 

[207] The following DNA primers were constructed for use in amplification of homologous 
CBH1 genes from genomic DNAs isolated from various microorganisms. All symbols used 
herein for protein and DNA sequences correspond to IUPAC IUB Biochemical Nomenclature 
Commission codes. 

[208] Homologous 5' (FRG192) and 3' (FRG193) primers were developed based on the 
sequence of CBH1 from Trichoderma reesei. Both primers contained Gateway cloning 
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sequences from Invitrogen® at the 5' of the primer. Primer FRG192 contained attB1 
sequence and primer FRG193 contained attB2 sequence. 
Sequence of FRG192 without the attB1 : 

ATGTATCGGAAGTTGGCCG (signal sequence of CBH1 H.jecorina) 
Sequence of FRG193 without the attB2: 

TTACAGGCACTGAGAGTAG (cellulose binding module of CBH1 H.jecorina) 

[2091 PCR conditions were as follows: 10 pL of 10X reaction buffer (10X reaction buffer 
comprising 100mM Tris HCI, pH 8-8.5; 250 mM KCI; 50 mM (NH 4 ) 2 S0 4 ; 20 mM MgS0 4 ); 0.2 
mM each of dATP, dTTP, dGTP, dCTP (final concentration), 1 pL of 100 ng/pL genomic 
DNA, 0.5 pL of PWO polymerase (Boehringer Mannheim, Cat # 1644-947) at 1 unit per pL, 
0.2pM of each primer, FRG192 and FRG193, (final concentration), 4pl DMSO and water to 
100 pL. 

[210] These conditions finally resulted in 4 genes from different species: 

1 . Hypocrea schweinitzii (CBS 243.63) 

2. Hypocrea orientalis (PPRI 3894) 

3. Trichoderma pseudokoningii (CBS 408.91 ) 

4. Trichoderma kon Hang bra 

Isolation of Cel7A gene sequences 

[211] The full length sequences were obtained directly by using the N terminal (FRG192) 
and C terminal (FRG193) primers. The full length DNA sequences were translated into three 
open reading frames using Vector NTI software. Comparison of DNA and protein sequences 
to H.jecorina Cel7A were performed to identify the putative intron sequences. Translation of 
the genomic DNA sequence without the intron sequences revealed the protein sequence of 
homologous CBHI's. Full length genes have been obtained and are provided in Figures 3, 
5, 7 and 9. 

Example 2 

Expression and thermostability of CBH1 homologs 

[212] The full-length genes from Example 1 were transferred to the A. niger Gateway 
compatible destination vector, which was developed by Genencor. This vector was built by 
using the pRAX1 as a backbone, shown in Figure 11, according to the manual given in 
Gateway™ Cloning Technology: version 1 page 34-38. 

[213] The newly developed expression vector is shown in Figure 12; this is a product of 
transferring the new genes into the destination vector pRAXdes2. This resulted in the final 
expression vectors called pRAXdesCBHI (specified with the species name) 
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[214] The constructs has been transformed into A niger van awamori according to the 
method described by Cao et al (Cao Q-N, Stubbs M, Ngo KQP, Ward M, Cunningham A, Pai 
EF, Tu G-C and Hofmann T (2000) Penicillopepsin-JT2 a recombinant enzyme from 
Penicillium janthinellum and contribution of a hydrogen bond in subsite S3 to kcat Protein 
Science 9:991-1001). 

[215] Transformants were streaked on minimal medium plates (Ballance DJ, Buxton FP, 
and Turner G (1983) Transformation of Aspergillus nidulans by the orotidine-5'-phosphate 
decarboxylase gene of Neurospora crassa Biochem Biophys Res Commun 1 12:284-289) 
and grown for 4 days at 30°C. Spores were collected using methods well known in the art 
(See <http://www.fgsc.net/fgn48/Kaminskyj.htm>). A. nidulans conidia are harvested in 
water (by rubbing the surface of a conidiating culture with a sterile bent glass rod to dislodge 
the spores) and can be stored for weeks to months at 4°C without a serious loss of viability. 
However, freshly harvested spores germinate more reproducibly. For long-term storage, 
spores can be stored in 50% glycerol at -20°C, or in 15-20% glycerol at -80°C. Glycerol is 
more easily pipetted as an 80% solution in water. 800\s\ of aqueous conidial suspension (as 
made for 4°C storage) added to 200pl 80% glycerol is used for a -80°C stock; 400 pi 
suspension added to 600 pi 80% glycerol is used for a -20°C stock. Vortex before freezing. 
For mutant collections, small pieces of conidiating cultures can be excised and placed in 
20% glycerol, vortexed, and frozen as -80°C stocks. In our case we store them in 50% 
glycerol at -80°C. 

[216] A. niger war awamori transformants were grown on minimal medium lacking uridine 
(Ballance et al. 1983). Transformants were screened for cellulase activity by inoculating 
1cm 2 of spore suspension from the sporulated grown agar plate into 100ml shake flasks for 
3 days at 37°C as described by Cao et al. (2000). 

[217] The CBHI activity assay is based on the hydrolysis of the nonfluorescent 4- 
methylumbelliferyl-fl-lactoside to the products lactose and 7-hydroxy-4-methylcoumarin, the 
latter product is responsible for the fluorescent signal. Pipette 170 pi 50 mM NaAc buffer pH 
4.5 in a 96-well microtiter plate (MTP) (Greiner, Fluotrac 200, art. nr. 655076) suitable for 
fluorescence. Add 10 pi of supernatant and then add 10 pi of MUL (1 mM 4- 
methylumbelliferyl-fl-lactoside (MUL) in milliQ water) and put the MTP in the Fluostar Galaxy 
(BMG Labtechnologies; D-77656 Offenburg). Measure the kinetics for 16 min. (8 cycles of 
120s each) using A320 nm (excitation) and A4 6 onm (emission) at 50°C. Supernatents having 
CBH activity were then subjected to Hydrophobic Interaction Chromatography as described 
in Example 5 below. 
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[218] The amino acid sequences were deduced as stated above in Example 1. The amino 
acid sequences for the CBH1 homologs are shown in Figures 4 (Hypocrea orientalis), 6 
{Hypocrea schweinitzii), 8 {Trichoderma konilangbra) and 10 {Trichoderma pseudokoningii). 
[219] Thermostability of the homologs was determined as described in Example 5 below. 



CBH1 homolog 


% identity 


Tm 


ATm I 


Hypocrea jecorina 




62.5 




Hypocrea schweinitzii (CBS 243.63) 


96.5 


61.4 


i -1.1 


Hypocrea orientalis (PPRI 3894) 


97.1 


62.8 


0.3 


Trichoderma pseudokoningii (CBS 408.91 ) 


94.9 


57.5 


I -5.0 


Trichoderma konilangbra 


93.0 


59.4 


-3.1 


Table 1 : Tm measurements and comparison 


between the d 


ifferent CBH1 


lomologous 



sequences. 

[220] As can be seen, the CBH I cellulase homologs had a slight or negative effect on the 
thermal stability of the variant CBH I cellulases compared to wild type. The homologs are 
closely related to H. jecorina CBH1; the thermal stability differences between H. jecorina and 
the homologs may indicate that sites with amino acid residues different from those found in 
H. jecorina CBH1 may be involved in thermostability. 

Example 3 

Identification of sites important for stability 

[221] The amino acid sequences of the CBH1 homologs characterized in Example 2, 
above, were aligned with the H. jecorina sequence with Vector NTI using the Clustal W 
algorithm with (Nucleic Acid Research, 22 (22): 4673-4680, 1994). The alignment is shown 
in Figure 2. 

[222] Possible sites involved in the stability of the CBH1 enzyme were determined three 
different ways based on alignment of the sequences of the homologs with CBH1 . In the first 
method, sites that differed between the H. jecorina CBH1 catalytic domain and the catalytic 
domain of at least one of the homologs of lower stability (i.e., excluding only H. orientalis) 
were identified as possible sites involved in the thermostability of CBH1. The sites identified 
were L6, P13, T24, Q27, S47, T59, T66, G88, N89, T160, Q186, S195, T232, E236, E239, 
G242, D249, N250, T281, E295, F311, E325, N327, D329, T332, A336, K354, V407, P412, 
T417 and/or F418 in CBH1 from Hypocrea jecorina. 

[223] In the second method, sites where the residue in H. jecorina OR H. orientalis is the 
same as that found in all of the decreased stability enzyme homologs resulted in the 
identification of sites that lacked correlation with Tm. The sites identified as retaining 
relevance with stability were L6, T24, Q27, S47, T59, T66, T160, Q186, S195, T232, E236, 
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G242, D249, T281, E295, E325, N327, D329, T332, K354, and/or P412 in CBH1 from 
Hypocrea jecorina, 

[224] In the final method, sites where H. jecorina AND H. orientalis are the same, with the 
corresponding residue in H. schweinitzii being either the same or different as in either of 
these two, but a different amino acid in the corresponding site of either T. konilangbra or 7. 
pseudokoningii were considered as possible sites involved in thermostability of the enzyme. 
These sites, which empirically showed the best correlation with Tm stability, were identified 
as Q186, S195, E325, T332 and P412. 

[225] Identification of the sites with amino acid residues different from those found in H. 
jecorina CBH1 were therefore subjected to site saturated mutagenesis. 

Example 4 
Expression of CBH1 variants 

[226] The PCR fragments were obtained using the primers and protocols described in 
Example 1 . The fragments were purified from an agarose gel using the Qiagen Gel 
extraction KIT. The purified fragments were used to perform a clonase reaction with the 
pDONR™201 vector from Invitrogen® using the Gateway™ Technology instruction manual 
(version C) from Invitrogen®, hereby incorporated by reference herein. Genes were then 
transferred from this ENTRY vector to the destination vector (pRAXdes2) to obtain the 
expression vector pRAXCBH 1 . 

[227] Cells were transformed with an expression vector comprising a desired cellulase 
encoding nucleic acid. The host cells, A. niger, were then grown under conditions permitting 
expression of the desired cellulase as described in Example 2. 

[228] The sites different to H. jecorina CBH1 , as identified in Example 3, may be involved 
in the thermostability of the variants and were therefore subjected to site saturated 
mutagenesis. 

Example 5 
Thermostability of CBH1 variants 

[229] CBH I cellulase variants are cloned and expressed as above (see Example 4). 
Cel7A wild type and variants are then purified from cell-free supernatants of these cultures 
by column chromatography. Proteins are purified using hydrophobic interaction 
chromatography (HIC). Columns were run on a BioCAD® Sprint Perfusion Chromatography 
System using Poros® 20 HP2 resin both made by Applied Biosystems. 
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[230] HIC columns are equilibrated with 5 column volumes of 0.020 M sodium phosphate, 
0.5 M ammonium sulfate at pH 6.8. Ammonium sulfate is added to the supernatants to a 
final concentration of approximately 0.5 M and the pH is adjusted to 6.8. After filtration, the 
supernatant is loaded onto the column. After loading, the column is washed with 10 column 
volumes of equilibration buffer and then eluted with a 10 column volume gradient from 0.5 M 
ammonium sulfate to zero ammonium sulfate in 0.02 M sodium phosphate pH 6.8. Cel7A is 
eluted approximately mid-gradient. Fractions are collected and pooled on the basis of 
reduced, SDS-PAGE gel analysis. 

[231] The melting points are determined according to the methods of Luo, et al., 
Biochemistry 34:10669 and Gloss, et al., Biochemistry 36:5612. 

[232] Data is collected on the Aviv 215 circular dichroism spectrophotometer. Spectra of 
the variants between 210 and 260 nanometers are taken at 25° C. Buffer conditions are 50 
mM Bis Tris Propane/50 mM ammonium acetate/glacial acetic acid at pH 5.5. The protein 
concentration is kept between 0.25 and 0.5 mgs/mL. After determining the optimal 
wavelength to monitor unfolding, the samples are thermally denatured by ramping the 
temperature from 25° C to 75° C under the same buffer conditions. Data is collected for 5 
seconds every 2 degrees. Partially reversible unfolding is monitored at 230 nanometers in 
an 0.1 centimeter path length cell. 

[233] The mutations introduced into the CBH I cellulase variants have a positive effect on 
the thermal stability of the variant CBH I cellulases compared to wild type. 

[234] It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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